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Introduction 


An MITx page for this class will be up by this afternoon. That's where most of the content for this class will come 
from (regular class meetings are recitations). 

Lecture videos on the MITx page are divided into 7 to 25 minute sections, each covering some particular topic. 
After each block, there are some “lecture exercises:” this class is structured in a way so that before each recitation, we 
should have finished going over the material designated for it. We'll also have real problems in problem sets — there 
are about 11 of these over the semester, and they're also going to be online. 

Homework is the best predictor of how we're going to do in this course — if we basically understand the homework 
completely, we're extremely likely to do very well. Homework on the MITx page is given to us problem by problem, 
part by part, but we'll also be given a PDF of the problems that we can complete like a regular problem set. And we'll 
have some tests in the evenings of the exams instead of recitations on those days. 

This class also has a graduate TA (Matt Hodel) and an MITx site administrator (Michelle Tomasik), and we should 
ask them if we have questions. 

Grading is 10 percent lecture questions, 30 percent online homework, 15 percent for each of the two midterms 
(March 11 and April 22), and 25 percent from the final. It’s pretty likely we'll be able to drop one problem set. Because 
this is an online course, we have a bit more flexibility - if we need a one-day extension, it’s easy for the system to give 
that. Beyond that, we should go through S°. 

In principle, no textbook is required for this class, but Griffiths and Shankar are good references — it’s good to read 
things in a different way. There will be lecture notes, and students seem to like them. 

MIT has been a bit reluctant to accept that online classes are equivalent to normal lecture-based classes, so in 
general, coming to recitation is generally mandatory. When this class runs in previous years, a rule has not had to be 
imposed, but attendance sheets are passed around. It will make a difference, and it’s important! 

Scheduled office hours will be posted by the end of today — there will be about 2 to 3 hours per week. Other times 


are also welcome by appointment. 


1 February 3, 2020 


8.05 is being changed a bit. The issue being corrected is that the addition of angular momentum is usually the last 
topic of the semester, and it’s always a little rush even though it’s a bit difficult. So we'll shift it a bit forward, and 
there will be at least three more lectures at the end about the density matrix (which is a topic that jumped between 


8.05 and 8.06 and eventually got dropped from both). So the aspects of 8.05 that were 8.04 review will go away: there 


will be an 8.04 review section in the MITx page, but it won't be covered anymore. We can use it at our discretion, or 


we can proceed to unit 1. 


Fact 1 


This class assumes pretty good knowledge of 8.04, so we'll move to new material very quickly. 


Wednesday's exercises will not be due, but next week's Monday, Wednesday, and Friday lectures will be graded. 


We're really hitting the ground running. 


Fact 2 

8.051 is a class based on online lectures, so we go through the lectures at our own pace. However, | will still 
transcribe notes from each lecture and insert them in the appropriate spots of the class: lectures 1 and 2 will be 
placed in the notes after this recitation (that is, before recitation 2), and all future lectures will be placed before 
their corresponding recitations. This means that the flow of the class material approximately corresponds to the 


flow of the notes. 


With the last half hour or so of this first recitation, we'll talk a bit about states in quantum mechanics. 


Example 3 (Two-state system) 


Consider a quantum system with two basis states w, and w_. 


This means that any state in this system can be written as a superposition of the two basis states. (Recall that 
the W notation tells us that these are [complex-valued] wavefunctions.) Because these are wavefunctions, they are 
normalized in such a way so that 


(Wi.04)=1, (Wip)y=l 


and it’s nice if the two basis states are orthonormal, so that 


(p+, p_) = 0. 


Recall that we often define the inner product between two wave functions as 
(60), 69) = f aed" CL, 


so that (¢, 6) = f dx|¢|? = 1. 


Question 4. How many real parameters do / need to characterize a general state in this quantum system? That is, 
how many real parameters are needed to describe the inequivalent states (that is, states that aren't proportional by 


some complex constant)? 
(Survey: 1, 3, 6, 3 votes for 1, 2, 3, and 4.) Let’s go through and solve this: a general superposition looks like 
Wx)=cav,t+cyp, 4,c€C. 


But many of these states are physically equivalent: right now, we have two complex numbers, which means four real 
parameters. But a state like w is equivalent to 2w, which is equivalent to (1 + 3/)w, and so on: we normalize the 


state to get the physics! So we need (w, w) = 1, which means 


(coW4 + cb, yy tcp) =. 


We can expand out the inner product, noticing that constants from the left term come out with a conjugate, to find 
that 


C.ey (hy, We) + chee be, h_) tt chee pa) cece pe) = 1. 


And now using our orthonormality conditions, this tells us that 
[ere le r= a, 


This is a real constraint, so it removes one parameter. 


But are we done? Let's use our new information to rewrite 
yp = ela, + el by_ 


where a, b > 0 and a, a_ are real. So now our condition tells us that a* + b? = 1, and a,b > 0, but now we can see 


another constraint: remember that we can normalize in the phase direction as well! So this is actually equivalent to 


y= apy + ell—%) py_, 


which means we really only have | 2 | free parameters: one for the phase difference and one for the magnitude of w_. 


In summary, let's rewrite this in a slightly nicer way: 
y= aps +e’ by_, 
where a, b > 0, a2 + b? = 1, which can also be written as 
py = cos(x)py + e” sin(x)y_, 


where 6 € [0, 2m] and x € [0, 3] (because both cos and sin have to be positive). 
Here's one way to never forget the characterization of this state: think about spherical coordinates. Every point 


has a coordinate @ and @, and in spherical coordinates we have @ € [0, z] and ¢ € [0, 27]. So we could write 


p= (cos 5) wi tel? (sin 3) w_. 


So we can think of the space of states as the surface of a 2-sphere in 3-dimensional space: for every direction of our 


2-sphere, we have a state, which is a linear combination of our “up state” Ww, and “down state” w_. 


2 The Variational Method and Introduction to Stern-Gerlach 


We'll start this class by discussing the variational problem, which might be new to some of us. This is connected 
to a field called the calculus of variations — this is a pretty complicated topic, and we won't get into a lot of the 
difficulties. The main idea of calculus of variations is to look at maxima and minima of a functional instead of a 
function. (In ordinary calculus, we try to find a point where some quantity is maximized, but here we'll be trying to 


find a function that maximizes some quantity.) So things are a bit more challenging! 


Fact 5 
Calculus of variations seemed to have originated from Newton. The question he considered was to start with a 
cross-sectional area and try to taper it in a way such that the resistance from a viscous fluid flowing through is 


minimized. And this is complicated, because our ultimate goal is to find a shape rather than just a single number 


or point (as we do in ordinary calculus). 


Fact 6 
Another famous problem, called the Brachistochrone problem, asks for us to design a curve from point A to 
point B in a plane such that an object (under the influence of gravity) traverses the path the fastest. This was a 


difficult problem for many mathematicians back then, including Leibniz, Newton, andthe Bernoullis! 


So how is this calculus of variations idea related to quantum mechanics? It seems that special functions are often 
the minimizer of some important quantity, and we have some special functions in quantum mechanics: the energy 
eigenfunctions. It's then natural to ask whether there’s a quantity that they minimize, and indeed the answer is yes. 


So let’s start setting up our problem: we're trying to solve the energy eigenstate equation 
Ap = Ey, 


where H is some (any-dimensional) Hamiltonian and w is a wavefunction (which can be of a vector X or just a single x 
in one dimension). It'll take a while for us to get to the “minimum or maximum” answer, so we'll start with a simpler 


question — determining something about the ground state energy. 


Theorem 7 (Variational principle) 
Let W(x) be an arbitrary normalized wavefunction (it doesn’t need to solve the Schrodinger equation), meaning 


that if \~b|?dxX = 1. Then we have the upper bound on the ground state energy 


/ WW" (x) A(x) = (Ay > Ege. 


This equation doesn’t tell us the exact value of the ground state energy, but it gives us an upper bound for any 
arbitrary function that we try putting in! And the strategy when we apply this principle is to plug in functions that 
we think look like the wave function of a bound state. (Those w(x) functions that we plug in are called trail 


wavefunctions. ) 


Remark 8. Unfortunately, there isn't a very good way of determining how good our bound is just from this equation 


— we'll develop some better tools later on. 


“Proof”. We'll make a few assumptions, which basically says that we don’t have a continuous spectrum of energy 


eigenvalues. This is only used so that the proof is easier to write down — it means we can write down our energies 
Eg, = £1 < Eo < E3---, 
and we know that our Hamiltonian acts on the energy eigenstates via 


Abn = EnWn. 


By completeness of the energy eigenstates, we can write our trial wave function 
W(x) = S> bnbn(x) 
n>1 


as a superposition of energy eigenstates. It’s important to note that w here does not solve the Schrodinger equation 


— it's just something invented out of our head. Now, we know that w is normalized, so 
1= f Wax => Ibn, 
n>1 


where we've used orthonormality of the energy eigenstates. And expanding out the expression for the expectation 


value of the Hamiltonian (H)y will yield 


[|v Fb)= 3 Ibe. 


n>1 


(This takes a few lines, but we're basically using orthonormality again.) But we're now almost done: the above 


expression is lower bounded by 


2 bP er Eps 1 =| Eee, 


n>1 


as desired. 


Our next step is to make a more general statement for the variational principle — it’s not always convenient for 
us to have normalized wave functions. But we know that for an unnormalized wave function w, we can create the 


normalized wave function 


woo 


\/fwlax 


and this should satisfy the equation in the variational principle. Thus, we find that 


and this is actually nicer because we don't need to work with such a restricted set of functions. 


Fact 9 


It is true that our trial function w cannot be completely arbitrary — it must be normalizable so that the denominator 


here is well-defined and finite. 


We can introduce the notation . 
J wv Hydx 
J pewdx ’ 


and this F is called a functional (which inputs a function and outputs a number). Basically, given a trial wave function 


Fly] = 


w(x), which is a function, we compute a number, which is an upper bound on the energy ground state. 

So we want to find the “critical point” of a functional — it seems that the ground state energy will be the minimum 
value of the functional. And in fact, this functional is minimized exactly by the ground state wave function! (This 
might seem dizzying, because a function can be specified by infinitely many numbers — critical points are hard to 
visualize.) But it turns out that every eigenstate is actually a critical point of the functional (though the non- 


ground states are saddle points), which we'll show in our homework. 


Example 10 


Consider the delta function potential 


V(x) =-ad(x), a>O0. 


The ground state energy is well-known here, since we've solved this problem in 8.04 already: we have 
ma? 


Egs = OR 


But let’s assume we don’t know what the ground-state wave function should look like, and we use some kind of 


12 2 
w(x) =e 2 s . 


(The reason for the B is to reap more of the benefits of our calculation to get a better bound — we can adjust our 


parameter GB to get the best possible bound on our energy.) So let's start calculating: we can find the denominator 


a4, — VE 
[Pax = a 


(from a standard Gaussian integral) to be 


and then we need to find 

|v cotbey = [ete (Fe - aa(x)) ei 
where we've substituted in the Hamiltonian with the given potential. This is not so fun, and in general in 8.05, we 
can use Mathematica or Maple or MATLAB to make our life easier, as long as we think we could theoretically 
find it ourselves. Well, the delta function part of the integral just gives us a —a@, because we pick out the value at 0 
with the delta function, and we actually want to integrate by parts for the term with the second derivative. This is 


because our integral reduces to 
2 2 
A dx ee a ‘ 


dx 
which is an easier integral to evaluate. And if we carry out all of the calculations, the final answer we get is that 


(remembering to bring in the term from the denominator) 


2 2 
Fly] = -Foa+ a ; 


This expression is 0 at @ = O, and It’s O at some other positive value of 6 — specifically, this is some quadratic 


function, and we can find the minimum to get the best possible upper bound on the energy ground state. Explicitly, 


the variational principle tells us that 


242 2 2 
Egs < min Pog ae = ee sis ; 
B JT 4m 2n 


And the term in parentheses on the right is the true ground state energy, so our bound is just off by a factor of 2, 


which is somewhat close! 

With that, we'll move on to a new topic: that of Stern-Gerlach devices and the spin 1/2 system. Spin 1/2 will 
keep us busy for a good chunk of the semester, and we'll go into a lot of detail — today, we'll just give the beginning of 
the story, and our descriptions will become more elaborate as time goes on. We'll start with the experiment which led 
to the discovery of spin, and we'll then describe how to construct a physical theory out of that. Afterwards, we'll spend 


some time on linear algebra, which will provide us with some mathematical tools for studying this physical model. 


Fact 11 


The Stern-Gerlach experiment was done in 1922 in Frankfurt, and it wasn’t clear for a while why it was being 


done. The background is that Pauli thought electrons had two degrees of freedom but didn’t know what they 
were. Kronig suggested that it had to do with some kind of rotation, but Pauli thought this made no sense. 
Meanwhile, Uhlenbeck and Goudsmit (in 1925) had the same idea, and Ehrenfest (their advisor) also thought this 


made no sense but still let them publish it. So that’s why they have credit for discovering the spin of an electron. 


Stern and Gerlach were atomic physicists who were actually interested in measuring speed of thermal motion 
of ions by using magnetic fields to deflect beams of these ions. Then experts who heard of Bohr, who said that an 
electron might have angular momentum, and they tried to detect it. When the experiment was run, they did see 
something, but the electrons in the silver atoms of those experiments actually had no angular momentum, only spin! 
So there was a lot of confusion — let's try to describe what exactly they saw and extract the quantum mechanics out 
of that. 


First of all, we don’t see spin directly in an experiments: we see magnetic moments. 


Definition 12 


A magnetic moment pu is the magnetic analog of an electric dipole, and it is given by the formula 


w=lA. 


Roughly what’s going on here is that we can imagine a loop in the plane with some current /, and there is some 
normal vector A corresponding to the area enclosed by that loop. Looking at units, it turns out that uwB has units of 


energy, So we can define the units 
Joules 


lu] = Tesla © 
Let's consider another situation in which a magnetic moment might come up: say we have a ring of charge of radius 


R with some total charge Q, which means we have a linear (uniform) charge density . In addition, say that this ring 
is rotating with some velocity v and has some mass M. Then we want to measure this magnetic moment, because 
there happens to be a fundamental relation between angular momentum and magnetic moments! 


To understand that, first note that the current satisfies 


Q 
ee eer 


(the rotating ring creates a charge “moving” through the wire in our frame), and thus the magnetic moment is 


1 
u=lA= 2 va? = 5QVR. 


This is an okay answer, but it depends on the radius and velocity of our ring. So let’s compare it to the magnitude of 


the angular momentum, which is r x p for each individual part of the ring. Thus, 


1Q Q 


And all of the “incidentals” like the radius and velocity of the ring have dropped out — the relation is thus universal for 
any ring, and thus It also holds for any axially symmetric object, like a hollow sphere or ellipsoid! 
With this in mind, we can now speculate that a particle might have a magnetic moment if there is a little 


ball of charge rotating inside it. This was exactly what Pauli didn’t like about the model, though, so let's take a 


more quantum mechanical approach to this. We'll consider a single particle, and now we'll replace L with S, the spin 
angular momentum of the particle (since this particle is no longer rotating around another object). So if this particle 


is, for instance, an electron, it’s natural to ask whether we have 


% ©. er fs 
Ome 2Me hi) 


If this were true, it would be a quantum analog of the classical statement. (The second equality is just to make the 3 


term unitless.) It turns out this equation isn’t quite true, but it’s very close! The actual answer turns out to be that 


gens 
b= Is A’ 
where g is sometimes called the Landé factor. We can usually calculate g — it turns out that the electron’s g-factor is 
equal to 2. So the magnetic moment is twice what we predict in the classical case (and this is predicted by the Dirac 
equation, the relativistic equation of the electron). Defining us = ot ~ 9.3 x 10-*4J/Tesla (here B stands for Bohr 


magneton), we have 


7 S 
b= —2be 5 ; 


where the negative sign comes from the negatively charged electron. 


Remark 13. Protons and neutrons have a more complicated system, because they are made up of quarks interacting 
in odd ways. (And the magnetic moment of a proton or neutron is much smaller, because the mass in the denominator 


of ue is much larger. 


With this, our next question is to think about how these magnetic moments behave under magnetic fields. 


Example 14 


Suppose we have a loop of charge in the plane, rotating counterclockwise, and magnetic field lines are coming up 


out of the plane and diverging (so that the magnetic field is weaker above the loop). Does the loop feel a force 


up or down? 


To solve this, we look at two diametrically opposite points and calculate the force / x B at both points. Both have 
a vertical component pointing down and the horizontal components cancel out, so the net force is pointing down, and 
in fact 
F=V(i-B) 


(which was derived in 8.02). We can indeed see that this equation is consistent: the force goes in the direction that 
makes ji - B largest. And this problem becomes simpler if the magnetic field is mostly in the z-direction. 

But with that, let’s return to the Stern-Gerlach experiment. Recall that silver atoms have 47 electrons — 46 of 
these fill out the lower energy levels, and there is a lone 5s electron which is out in its own spherical shell. An s 
state electron has zero orbital angular momentum, so throwing a silver atom through an apparatus is essentially like 
throwing a single electron (that is, throwing spins), because everything else cancels out! 

As far as we're considered experimentally, what we're actually throwing is dipole moments: magnetic fields push 
these dipole moments up and down. So here's the apparatus that was being used: silver atoms are shot in a beam, 
and we insert a magnet such that the magnetic field lines, mostly in the z-direction, have a slight gradient. Then 


these silver atoms hit a screen after the magnet, and we can track where they land. 


In such an apparatus, we can assume that the force will be approximately in the z-direction: 


F=V(ié-B)® we ee, 
(It turns out that the gradient in the other directions averages out!) So it seems like the magnetic moments will all 
be distributed in random directions, so we'll get a smudge of z-components when the silver hits the screen. 

But the shock was that there were actually two separate peaks! The magnetic field being used in this experiment 
was about 0.1 Tesla, and the space quantization that was seen between the two peaks was about 0.2 millimeters. So 
this was clear enough for everyone to observe it, and this caused a bit of confusion. 

At the end of the day, people realized that this didn't come from angular momentum of the electron around the 


atom, and the concept of spin came back. It took a while for the details to be resolved, but the idea is that 


S 
Mz = —2ue=, 
and experiments suggested that the true value of the dimensionless term was 


S; 1 


Wh = a 
Such a particle is known as a spin 1/2 particle! With our 8.04 knowledge, we can think of this quantum mechanically 
as our states being a superposition of a “spin up” and a “spin down” state. And when the particle goes through the 
magnetic field, it splits into these two beams, and the wavefunction collapses when we observe it (that is, when the 
silver atom hits a screen). 

To finish this lecture, let’s do a few thought experiments with our Stern-Gerlach apparatus. We can represent this 
with a box: a beam (an inward arrow) goes in, and two beams (represented by outward arrows) come out, with values 


S-= a and S, = —#. This box measures the value of S,, and it splits our beam. 


Example 15 


Take a Stern-Gerlach device and block the lower beam (so only the S, = f beam passes through). Feed that 


beam into another Stern-Gerlach (z) device — what happens? 


We see experimentally that nothing comes out from the bottom of the second Stern-Gerlach device: all particles 


have S, = us Quantum mechanically, we can think of this as having the quantum states 
Iz; +), |z;+z;-). 


We'll be thinking of these as our basis states: any other state, even a state that points along the x-direction, is a 
superposition of these two states. And this is a big assumption to make: we're saying that the set of spin states 
is a two-dimensional complex vector space, where linear combinations of |z;+) and |z;+z;—) can represent all 
configurations. So in algebra, this means that measuring a |Z; +) state will have no minus component, or equivalently 
that 

(z; -|Z; +) = 0. 


(These are orthogonal basis states.) We can also similarly write that 
2+eS)=0, itl =i. 


This might seem a little strange to us, because there's a bit of a conflict with “orthgonality” here — it turns out that 


“up” and “down” are orthogonal, not antiparallel! And this is one of the most confusing parts of working with spin 1/2 


particles — we shouldn’t think of this overlap expression above as a regular dot product. 


Example 16 


Again, start with a z-filter Stern-Gerlach device with the bottom beam blocked, and feed this into an x-filter 


Stern-Gerlach device. This turns out to give us S, = a and S, = —i with equal probability. 


This means that spin states along the x and z directions do have some overlap — they aren’t orthogonal! In 


particular, we get the expression 


(We'll be more precise with all of this notation later.) 


Example 17 


Take the system from the previous example, and this time block particles with S, = us So the exiting beam has 


all particles satisfying S, = —2: what happens when we feed this into a z-filter Stern-Gerlach machine? 


It seems possible that all of these particles that we've filtered have both S, = —3 and S,; = A since we've filtered 


for both qualities. But that doesn’t happen! We get both S; = +3, so the “memory of the state’ from the first 


Stern-Gerlach device has been destroyed. We'll talk more formally about all of this next time, and we'll try to discuss 


more about the relations about our different states. 


3 Spin One-Half, Bras, Kets, and Operators 


Last time, we spoke about the Stern-Gerlach experiment, and we discussed how a sequence of Stern-Gerlach boxes 
could help us extract properties of the spin 1/2 system. The biggest surprise was that these Stern-Gerlach devices 
split our magnetic moments into two beams (basically forcing them to point in one of two opposite directions). Today, 
we'll talk more about how to represent the set of states as a two-dimensional vector space, and we'll set up all of 
the machinery that will be necessary. Even though we haven't quite discussed all of the linear algebra concepts, we'll 
assume some vague ideas today — mathematical formalism will come soon. 


Recall that the possible states of the silver atom (really, an electron) can be described by the two states |z; +) and 


fh 
2) 


our atoms through a z-filter Stern-Gerlach device.) We can ask whether the state |z;+) have an angular momentum 


|Z; —), corresponding to angular momenta S, = g an respectively. (The z-label here indicates that we've passed 


in the x- or y-direction, and we'll be able to answer that soon. 


Proposition 18 


Saying that S; = a really means that there is an operator S, with 


K h 
S|z;+) = 512; +). 


Operators often come up in quantum mechanics because they represent measurement of some sort! And here S, 
also acts on the other state and gives 


" hi 
S|Zi-) = 5 1z-). 


10 


Operators on a state give another state, and the nice thing in this case is that the operators on the left and right 
are just scalar multiples of each other. This is known as an eigenstate, and such eigenstates have corresponding 
eigenvalues (such as the 4 and —8 above). 

As mentioned last time, the underlying assumption we're using here is that these two states are enough. In other 
words, if we do this experiment again with an x-filter Stern-Gerlach device, |x; +) and |x; —) are also valid states. But 


we're postulating a theory of spin in which |z;+) and |z; —) are basis states, so 
ey =e Zi) 2; =) 


(for some complex numbers ci, Co) for any spin state w. This is called a two-dimensional complex vector space, 
because we have two basis vectors and complex coefficients. (The |z;+) object doesn’t quite look like a vector, but 
it’s what we call a ket. and we'll make the correspondence clear in subsequent lectures. ) 

Letting |z;+) be the first basis state and |z;—) be the second, we should be clear that these vectors are not 
“complex:” a complex vector space means the coefficients, not the vectors, are complex numbers. To be more 
concrete, we'll use a representation (that is, some way of exhibiting a vector in a more familiar way). Then if we 


define |z; +) =|1) and |z; —) = |2), we will represent 
1 0 
|Z; +) =|1) << al’ Zi—)= (2) 


and now our states look like column vectors! In particular, we can now represent any state as a two-dimensional 


column vector 


I) = c1|1) +e (2) <> a | 
(7) 
To proceed, we'll return to an example of the Stern-Gerlach experiment. One thing we did was filter the first machine 
so that all particles are in a |z;+) state — if we feed this beam into another z-filter Stern-Gerlach apparatus, then all 
of the states will be in the + state. (There's zero probability of being in the - state.) The way we'll represent this in 
mathematical terms is that those two states are orthogonal, and to explain that, we'll need to talk about bras and 
kets. For now, though, we'll just explain the basics. 


What we're saying is that the bra-ket or overlap of the + and - states Is zero: 
(z; -|Z; +) = 0. 


Similarly, the states are “well-normalized,” so 


(z; +|z; +) =1. 


We can make analogous equations when the right entry is z;—, and we can now use the notation |1) and |2) usefully 


by summarizing all four equations as 
(il) = Oij- 
We haven't really defined what these “bras’ are yet — let’s start with a working definition. We associate the ket |1) 


1 
with the column vector A , and similarly we'll associate the bra (1| with the row vector F 0). (This means the bra 


(2| will be thought of as the row vector [o 1| .) In slightly more generality, we can write a state 


ja) =a |1) +a2|2) bar 
Q2 


11 


Similarly, we can associate 
Bi 
IG) = 61 |1) + Bo |2) <> 
Bo 
And now we can define the corresponding bra for a to be 
(a = of (1) +05 (2 = [ox as), 


where a} denotes the complex conjugate of a,. 


Fact 19 


We'll make all of these definitions more axiomatically soon — this is all just to give us some intuition. 


And now we can define the bra-ket (a|G), which is a number — ultimately the reason for the complex conjugation 
above is to make sure (a|a) is a positive number (it’s a “length squared”). And the reasonable way for us to get a 


definition of this is to take the matrix multiplication of the representatives 


1 


le) ~ [a;_os] [Ft] = aim +036 


2 


We'll soon see that this is actually an inner product — vectors that satisfy the above (ily) = 6; inner product relation 


are known as orthonormal (because they're normal with respect to each other, and they're also orthogonal). 


Example 20 


We can check, for example, that taking inner products like (1|2) give the same value as our explicit definition. 


With this, we can now return to the idea of representing our states as column vectors by thinking about our 


operator again. The only object that naturally acts on two-component vectors Is a 2 x 2 matrix, so we're going to 


I 


And this can be verified now that we're writing our states as column vectors: for example, 


sm-tb IE-40-30) 


as desired. And we can also check that the action S, |2) = -! |2) is correct, and the idea is that we don’t need 


claim that our spin operator can be written as 


to check any more vectors: since the operator behaves correctly on basis vectors, it will behave correctly on any 
arbitrary vector in this space. 

But we're not quite done yet: remember that in one of the experiments from last time, we wanted to know about 
what happens when we measure the spin states along the x-direction. In other words, how do we know that we 
can come up with numbers cy, co such that cy |Z; +) + c|z;—) points along the x-direction? So we need to invent 
something new — this is always a difficult process, and there’s many different approaches we can take. We won't be 
using Feynman's approach of rotating Stern-Gerlach machines: instead, we'll think about angular momentum again. 

Basically, we'll compare spin angular momentum with orbital angular momentum: what we really care about is 


the operators Se and Si: Remember that we have, in the classical case, the angular momenta i i, ce And these 
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are a lot easier to work with: they look like xf, — ¥p,, and we know how that kind of object works on wave functions. 
(This is a lot nicer than S,, which is working in a different kind of space.) 

The key idea is that [, is a Hermitian operator — this is good, because it means we have a good observable. And 
S, is Hermitian as well, which means that its complex conjugate transpose is equal to itself (we'll talk a lot more about 


this later). One useful property of the L's is that we have the commutator relation 
(Li, ci] = ineijnl x 


(where [A, B] = AB — BA), if we denote - ee f, = 04, Lo, L3 respectively. This is called the algebra of angular 
momentum, and in the relation above we're using index notation, meaning that we're summing over k = 1, 2,3, and 
Eijk iS 1 for an even permutation, —1 for an odd permutation, and 0 otherwise (we'll get more practice with this later). 


So we can write these formulas out explicitly: 
isis itl, [ijileth. je Gile=ab. 


So our goal will be to find an analogy of this for Sx, i Bo Specifically, our goal will be that 


SeSa= mS; Sese\=iSe (545. =S,. 
So we need to determine what our Hermitian 2 x 2 matrices look like — by definition, we need them to look like 
2c atib 
a—ib 2d 
the off-diagonal entries to each other). So being hermitian is some kind of “reality” condition — the 2c and 2d are just 


, where a, b,c, d are real (this is because we take the complex conjugate of all entries and then flip 


for convenience later on. 
To make progress, note that we're trying to find matrices Se By that satisfy commutation relations. If there's any 

identity matrix terms, that will commute with everything (so it doesn’t contribute at all). So we'll remove the “identity 

matrix” part from this: 

c-—d a+ib 


2c a+ib 
a—ib d-c 


a-—ib- 2d 


—(c+d)lox2= 


Remember that we already have the Hermitian matrix S,: it has a number on the top diagonal entry and the opposite 


number on the bottom diagonal entry. We want Gs Ss to be “independent” from Sm so we should kill the diagonal 


terms. This leaves us with just 


0 a—Ib : ; 
; , which we can rewrite as 
a+ib 


where a and b are real numbers. 


Proposition 21 


What's funny here is that we can think of Hermitian spaces as forming a vector space — adding two Hermitian 


matrices still gives a Hermitian matrix, and multiplying a Hermitian matrix by a real number still gives us something 


Hermitian. 


So the set of Hermitian matrices is a real vector space with four basis vectors: 


bb abd E ol 
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So these four matrices are the “linearly independent Hermitian matrices” — these are quite famous, but let's first finish 
up our problem. It sounds like we've found two potential matrices for S, and S but we don’t know what the scale 
factor is. So we need a bit of physics: the eigenvalues of the other two operators should also be a because we could 
have done all of the Stern-Gerlach experiment by thinking of a different direction. 

So there are some sign issues — the answer isn’t completely unique — but luckily, everyone uses the same conven- 


tion for these matrices S,, oy (though the identities would be preserved if we used slightly different matrices as well). 


0 1 
And from here, our attention should turn to eigenvalues and eigenvectors. The matrix i 6 has two eigenvalues: 
» = 1 for the ei ie d A = —1 for the ei i. |.4 Similarl alk: A» = 1 for th 
= 1 for the eigenvector J5 ; an = —1 for the eigenvector =; air imilarly, _ as X = 1 for the 


1 1 : 
eigenvector ya ; and \ = —1 for Bs |. The 58 here are to make sure our column vectors are normalized — 
i —i 


their lengths should be 1, where length is defined in terms of the bra-ket inner product. And now that our eigenvalues 


01) ~ fA fo -i 
, S=n= 


(Multiplying matrices by numbers multiplies the eigenvalues by those numbers as well.) And to check that these indeed 


are +1 for these matrices, it’s natural to try 


Y 
lI 
N| so 


work, we check that commutator relations. Let’s do an example: 


Example 22 


What is the commutator of S, and S,? 


We can pull out the A factors to get 


he f}o 1) jo —-i Qf) 0 Ge). ae i O —! 0 
4\\1 o}]i oO F Gb oO 4\jo -i o Ay 
Simplifying this indeed yields 
hh |2i 0 hjl Oo e 
pas = jfi- = iNSz. 
333 a] ab ‘] : 


And we can double check that the other commutator relations hold, and now we've found our three matrices for our 
spin states along the x, y, and z directions! These are extremely important, and they're important enough that we 
have the definition 


a fi 
Si = 5%: 


where the o; are the Pauli matrices 


0 1 0 -I 1 0O 
O1= ¢ C2= i! « 1, 03> . 
1 0 i 0 QO -1 


This is enough to give us the answer to almost all experiments we could do with the Stern-Gerlach apparatus! For 
example, we said that 
Sx x; =) 


hi 
5 bit), 
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but we also already know the eigenstates for this operator! And thus 


|x; +) = : =f +e )) 


corresponds to the eigenvector with eigenvalue 1 in o1, and 


get a | Le. ak se 
|x; —) an 7g (2+) |Z:—)) 


corresponds to the eigenvector with eigenvalue —1 in 01. So these aren't new states that we need to add to the state 
space — they're linear combinations of the states we already have! (And similarly, this means that we can write |z; +) 


in terms of |x;-+) and |z; —) if we want to.) 


— 


So let's return to one of the questions: what happens to a |z;-+) state under an x-filter? The overlap of an |x; + 
state with the |z;-+) is just going to be 


(x; +|z; +) = a 


and the same number comes up for (x; —|z;+). These amplitudes are equal, so the probabilities are indeed s each, 


which is what we want! 


And of course, we can construct the y-states in a very similar way: Sy has eigenstates |y; +) such that 


fi 
ES |i). 


T 
we 
| 


Sy ly: 


and this will give us 


z:-)). 


It's very important here that we’re working with complex numbers: that’s the only way we can get so many linear 


1 
ly: =) = WI (|Z; +) 4 


combinations that are all orthogonal! 
So we've described a theory, and our goal will be to expand it now. We're now able to produce a state along the 
x, y, and z-directions; let's see if we can expand this to producing states along any unit vector 7 = (ny, ny, nz). To 
make some progress on this question, consider the triplet of operators (Sy, Sys S,). (We could write this out as 
$,& and so on, but that doesn’t really make any sense other than as an accounting procedure.) The idea now is to 
consider the spin operator 
Sa = t-S = ny Sy + nySy + nzSz. 


Indeed, if this vector A points in the z-direction, we have (nx, ny, Nz) = (0,0, 1) and we do recover S,. A similar thing 


holds for x and y, so this is the spin operator in the direction of the vector 77. 


We're going to want S, to have the same eigenvalues of +4 as our fundamental spin operators: to make that 


clear, we'll use spherical coordinates 
nz =cos@, n,=sin@cos¢, ny =sin@sing. 
Then we can just do some computation: 


A h 
SaH=a-S= 5 (xo t ny O2 t nz03) = 


NO] ot 
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And now plugging in the spherical coordinates makes this simplify very nicely: we actually just have 


oe 


hi cos@ =e # sin / 
5 


e'gsin@ —cos@ 


For completeness, let's calculate the eigenvectors and eigenvalues here: remember that in order to find an eigenvalue 


d of a matrix A, we solve the equation det(A — A/) = 0. So we want 


r ee he-'d sin @ | 
e 


Felgsind —2cos@—d 


This turns out to not be very bad — the phases cancel out, and we indeed do get A = +3. But the eigenvectors are 


more nontrivial: to find an eigenvector in the f#-direction, we need to find vectors |; --) such that 


alist) = 4 |F £). 


S 
: Cy 
In other words, we want to find a vector | such that 
C2 


eo) 


We can factor out the us and what we end up needing to solve is that 


cosd—1 fe-sind | |q 
= 0. 
Felgpsind —cos@—1] |q@ 
This gives two equations that relate cy and co, and they actually tell us the same information (this is exactly the 


purpose of having our eigenvalues)! Either way, what we find is that 


1—cosé@ 


© =e? C1, 


sind 
and we can simplify this by using the half-angle identities on both the numerator and denominator of our fraction. We 


end up finding that 


and if we want a normalized eigenvector with |c;|?-+|co|? = 1, it turns out that we need |c1|? = cos? 8. And it doesn’t 


really matter what phase we choose, so let’s keep it simple: 
C1 = cos 5" © =sin—e?, 


and we've found our state 


|7; +) = cos 


6. 
5 |e +) 4 sin 5e |z; ) |. 


In other words, we've found the spin state that points in the fdirection as a linear superposition as our basis states! 
And we can check that setting @ = 0 gives us the z-axis, and this does indeed recover the |z; +) state. 


A similar calculation seems to tell us that 


nA: el 0 4 O ib : 
| 7; ) = sins |Z, +) cos 5e |Z; —). 
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But now if we take 0 = 0, we're supposed to end up with the minus state along the direction of the z-axis, which is 
the |z; —) state. But now the second term is not so well-defined, because @ can be anything! So instead it’s better to 


multiply through by the phase so that 


6, 6 
|7;—) =sin ae |Z; ++) — cos 5 |Z; —) |. 


We've now basically done everything that’s possible to do without reviewing linear algebra — that’s what we'll do soon. 


Fact 23 
% 


Notice that of = 05 = 03 = 1, and this actually tells us something about the eigenvalues of these matrices. 


In general, if a matrix M satisfies a matrix equation like M? + aM +! = 0, then the eigenvalues also satisfy the 
same equation. This is because we can let both sides of this equation act on an eigenvector v of eigenvalue A: notice 
that M2v = Mav = Av, so 

0=(M? +aM+Bl)v = d*°v + adv + Bv. 


Since this is true for a eigenvector v, which is defined to be nonzero, we must have A? + aA + B = 0, which is the 


same equation as we had for our matrix equation! 


So what this tells us is that 01's eigenvalues satisfy \* = 1, and thus we have eigenvalues of +1. It’s possible that 
they're both 1 or both —1, but now we can use the trace: the sum of the diagonal entries of our matrix is also the 
sum of the eigenvalues! Since this trace is 0, we must have one eigenvalue be 1 and the other be —1, as we calculated 
earlier. 

Let's talk some more about these Pauli matrices now: remember that our spin operators are f times the Paull 


matrices, and we have the algebra for angular momentum 
[Si, Si] = INE jjK Sk. 


Plugging in the corresponding Pauli matrices, we find that 


Io, on = 21E ijkOk : 


It also turns out that there's a nice property with anticommutators: we have 
0102 = —0201, 


so the two matrices actually anticommute. We denote this with the anticommutator, and the general form of this 
identity is that 
{0;, Oj} = 26jjlox2. 


Since we can write any matrix product as 
1 1 
AB = 5IA. B) + 51, B} 


(by direct expansion), we can plug in the Pauli matrices to find that 


OjO; = oi! + 1EijkOk ‘ 


To make this look even nicer, we consider the “vector” @ = (01, 02,03). Then we can take the dot product with a 
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vector a= (a1, a2, 43) 


a+ O = a0, + 4002 + 4303 = aj0; 


with the repeated index notation. Now, if we multiply the boxed equation above by a;b;, we find that 
ajojbjoj = apbjdjjl + l€ijnaj eo. 

And now we can write this in a neater form: we have 
(2: &)(b- OF) = abil + le ijn ajibdjor, 


and now the right hand side actually have to do with dot and cross products: 


(a: G)(b- G) = (a- b)I + (Fx b)- ae |. 


This has now represented a product of Pauli matrices in a geometric way, and this is very useful for doing calculations! 


Example 24 


Let's say that a= b= is a unit vector — how does the above equation simplify? 


Plugging everything in, we find that 
(i0)" =1. 
This is useful, because it makes it easier for us to understand the operator S; from last time (which we defined to be 
n-S= ai. &). If we square this, we end up with ey /. Note that the trace of the S, operator is also zero — this 
is because we're adding a linear combination of 01, 02,03, which are all traceless — and thus by the same argument, 
we know that the eigenvalues of the matrix Sq have to be A and —f. We didn't need to calculate the eigenvalues 
directly! 

And this is fundamental because we've now discovered a key property of spin: if we measure it along any arbitrary 
direction, we'll always find uw or —2. (And this makes sense — the universe is isotropic, so the direction should not 
matter here.) 

We'll finish with a bit of an aside: if we have two triplets of operators X = (X,, %, %) and Y= (V1, Yo, ¥3), We 
can define their dot product 

xX-Y=S, 
where we sum over /. (Such a dot product may not commute, because operators don’t always commute.) And we 
can similarly define the cross-product 
(X x Y Yk = Eijk XiY), 
just like we do with number-valued vectors. Again, this cross product does not need to satisfy the same properties as 
the normal cross product — even ae 4 may be nonzero! So one thing we'll be asked to compute is the value of SxS 


— it'll be an interesting result. 


4 February 5, 2020 


We'll talk a bit about the axioms of quantum mechanics today, which will be pretty important for us in connection 


with the 8.051 material. And we'll do some practice with the variational principle as well. 
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As a reminder, lecture questions are due 9 am before every Monday and Wednesday (except for today). We'll be 
moving now into spin 1/2, and there’s the issue of manipulating indices — we should watch the index manipulation 


video if we haven't worked with this before. Starting next way, there will be a lot of emphasis on linear algebra. 


Fact 25 


Tentative office hours have been announced (Wednesday 11-12 and Thursday 1:30-2:30). 


Here are the axioms of quantum mechanics: 
+ States exist. A state is a complete description of a physical system, and it is a ray in Hilbert space. 


A Hilbert space is a complex vector space — it does not mean the vectors are complex numbers, only the scalars! 
Geometrically, a ray in Hilbert space represents a one-dimensional subspace (the collection of all vectors along a specific 
line), because w and cw represent the same state for any complex number c. But we'll mostly work with normalized 
wavefunctions, so we'll pick c so that our vectors have length 1. 


We'll talk more about Hilbert spaces later, but one important thing is that there is an inner product 


(dpyec 


which satisfies the properties 


(9.6) €R>o, (¢,¢)=0 = G=0. 


This means that the norm 
Ill = V(¢, >) 


of any state is nonnegative (and only zero if @ itself is zero). One last important property is that (¢, w) = (wv, ¢)* 
(conjugates of each other). Hilbert spaces can be finite-dimensional or infinite-dimensional, and the latter makes things 


a bit more difficult to work with. 
+ Observables exist. An observable is a Hermitian operator, meaning that Ai =A. 


We'll discuss the spectral theorem later in this class, which is important: eigenstates of Hermitian operators 
give an orthonormal basis for the Hilbert space! Basically, we can write any such operator in a finite-dimensional 


vector space of dimension N as 
N 
A= S anEn, 
n=1 


where a, are actually the eigenvalues of A and E, are called the orthogonal projectors. (A projector P satisfies 
P2 =P.) 


- Measurement postulate. Measurement of an observable A results in the system becoming an eigenstate of A. 


We'll end up in a new state 


E,w 
Envil 
for some fixed n (though we don't know what nis). Remember that E, is a projector, so it moves us to an eigenstate, 
and then the denominator normalizes us to unit norm. One important note is that the probability that A measure a, 
(the eigenvalue for E,,) is the inner product (~, E,w). 
This is a shocking admission that we can't really know things in quantum mechanics! Supopse we throw linearly 


polarized photons into a polarizer at an angle. Then some fraction of the photons will go through, but who decides 
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which ones go through and which don't? There seems to be some irreducible unpredictability in the universe, which 


people have trouble with. 


+ Dynamics. The evolution of any state w in time is governed by a unitary operator U/(t) 
W(t) = U(t)p(t = 0). 


(A unitary operator is an operator that preserves lengths of vectors.) We'll see in a few weeks that even though we 
know the Schrodinger equation governs dynamics, this postulate alone is actually enough to show where the equation 


inge = Hw comes from (we'll build H from U/ — it’s Hermitian and has units of energy). 


« Composite systems. Say we have two quantum systems A and B in Hilbert spaces H, and Hg — if we want to 


describe the two systems together, we live in the tensor product Hy, ® He. 


No one has found a logical problem or experimental situation that has contradicted these axioms yet, and so 
discussions and attempts at interpretation have been ongoing — no good consensus has been found yet! 


We'll finish today by briefly talking about the variational principle: 


Proposition 26 (Variational principle) 


If we know the Hamiltonian H for a system, we can use any normalized trial wavefunction W(X). Then the ground 
state energy satisfies 


Egs < (W, HY). 


(Equality holds when w is the ground state eigenstate. ) 


So we can try better and better trial wavefunctions and get better and better bounds for the energy ground state! 


The proof is fairly nice, and there are generalizations that will be explored soon. 


5 Linear Algebra — Vector Spaces and Operators, Part 1 


We'll be doing linear algebra slightly differently from how it’s been done in 8.05 in the past: MIT uses a book called 
Linear Algebra Done Right (by Sheldon Axler) for the 18.700 linear algebra class. It’s a bit difficult to learn things 
this way if we haven't heard of a matrix or determinant or eigenvalue before, but the book is a very nice way of learning 


if we've heard those words before! 


Fact 27 


We need to study this book pretty seriously if we want to grab the results that we want out of it. This is because 


the book builds up from basic properties and develops theorems and ideas in a logical progression. 


So we'll be introducing linear algebra in the same way in this class — it’s not too clear how much detail we'll need, 
but we'll try to make the main points about structure in a vector space clear. Otherwise, we might miss some 
important basic ideas! (For example, many physicists don’t quite realize that in the matrix representations, we don’t 
need bras and kets. And also, there is a big difference between a real and complex vector space, which is a detail that 
we might miss otherwise. ) 

We'll begin by talking about vector spaces and dimensionality. Throughout this part of the class, we should 
remember that the end result is a vector space of states in our physical system, where observables are linear operators. 


So we'll need to understand all of those individual properties to make some more progress. 
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In a vector space, we have two kinds of objects: numbers and vectors. If the numbers are real (resp: complex), 
we have a real (resp: complex) vector space. (Vectors aren't “real” or “complex” or anything like that.) There are two 
key operations: we can add vectors, and we can multiply vectors by numbers. 

It turns out that the set of numbers we'll be using in this class, often either the real numbers R or complex numbers 
C, form a field. We won't define what this is, but we'll use the notation F to denote either kind of field. Let's now 


set up our formal definition: 


Definition 28 


A vector space V \s a set of vectors equipped with an addition operation +, which takes in two vectors u,v € V 


and gives us a vector u+ v € V. We also have a scalar multiplication operation by elements of F, such that 


av € V for any a€ F and v € V. In addition, these operations must follow the axioms defined in Definition 29. 


Here, the vector space is closed under addition: we can’t get out of it if we just keep adding vectors. (And similarly, 


it’s also closed under scalar multiplication.) 


Definition 29 (Vector space axioms) 


A vector space V must satisfy the following properties: 


*u+tv=v-+u (addition is commutative) and u+ (v+ w) = (u+ v) + w (addition is associative) for any 


u,V,w,€ V. 
a(bv) = (ab)v for any a,b € F andveV. 


There is an additive identity 0 € V such that v+0 = v for all v € V, and a multiplicative identity 1 € F 
such that 1- v =v for all ve V. 


Additive inverses exist: for any v € V, there is a u € V such that u+-v = 0. 


We have distributivity between multiplication and addition: a(u+ v) = au+av and (a+ b)v = ab+ bv 
fora,be€ Fandu,veV. 


Remark 30. There can often be a bit of confusion between the 0 number and the 0 vector, so we should watch out 
for that. 


This seems like a lot of properties, but it’s a good set of “minimal requirements” — from this, we can show lots of 


things with little proofs pretty immediately. Here’s a quick example: 


Lemma 31 


The additive identity 0 € V is unique. 


Proof. Suppose there were two zero vectors 0,0’. Then 0 =0+0! = 0’, so we must have 0 = 0’. 


Similarly, we can show that Ov = 0 for any vector v. Here, we should be careful — the 0 on the left side is in F, and 
the 0 on the right side is in V. We can also find that a0 = 0 and so on: basically, the zero vector and zero number do 
exactly what we expect them to do. 

It also turns out that the additive inverse of v, which we denote —v, Is unique and is equal to (—1)-v. This might 
seem silly, but it’s worth trying to prove them — these results are not completely trivial! 


Let's do a few examples: the main thing to keep in mind is that vectors are not real or complex. 
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Example 32 


2 ; 3 
The N-component vectors | _ |, with all entries a; € R, form a vector space over R (a real vector space). 


We have to think for a second to see if we believe all of the axioms for vector spaces here, but the definition of 
addition and multiplication are pretty easy: we just do everything component-wise. And then it’s easy to find the zero 
vector — it’s just the one with all components O — and the additive inverse — it’s where we take the negative of each 


entry. So if we understand addition and multiplication, the rest is not too difficult. 


Example 33 
Ait 5 EH, 


421 "t+ ON 
The M x N matrices with complex entries, ; ; : _ |, form a complex vector space. 


Addition and multiplication also look pretty familiar here, because everything is done entry by entry again. 


Example 34 


The set of 2 x 2 Hermitian matrices form a real vector space. 


This might look a bit surprising, because Hermitian matrices have /s in the entries: remember that the most general 
ct+td a+tib 
a—ib c—d 
is that multiplying by complex numbers doesn’t preserve Hermiticity (for instance, multiplying by / is not allowed). So 


matrix looks like , where a, b, c, d are real numbers. The reason that we must have a real vector space 


0 
this might feel a bit weird: something like 
=i 


i 

4 is a vector over the real numbers! 

Example 35 

The set of all polynomials, each of the form p(z) = a9 + a1Z +--+-+ az" for aj € F and some nonnegative 


integer n, form an F-vector space. 


To verify this, note that we sum polynomials by combining terms of the same exponent, and we multiply a polynomial 
by a number by multiplying each term by a number. (And we don’t need to worry about multiplying two polynomials 
together, because that’s not a thing we need to do with this vector space.) 

This vector space looks like it is infinite dimensional: we have the constant, linear, quadratic, cubic, quartic 


polynomials, and so on. And we'll soon see that this is indeed the case. 


Example 36 


The set of infinite sequences F° = (x, X2,---) (where x; € F) is an F-vector space. 


Addition and multiplication are defined the same way as we usually do. 
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Example 37 


The set of complex-valued functions f(x) on an interval x € [0, L] is a complex vector space. 


These last three examples seem to be infinite dimensional, and now we're going to try to formalize the ideas around 


that. We'll start by understanding the concept of a subspace: 


Definition 38 


A subspace of a vector space V is a subset W of the vectors of V in which W is also a vector space. 


There are some conditions that are necessary for such a subspace to exist: it must contain the zero vector, 
because every vector space has a zero vector. In general, we also need to check that our subspace is closed under 


vector addition and scalar multiplication — the extra axioms will automatically follow, because W is just a subset. 


Example 39 


Let's consider a two-dimensional real vector space V = IR?: vectors here can be written as (V1, V2), where 


V1, Vo € R. Consider the subset W of vectors where 3v; + 4v2 = a for some real number a. 


When is such a collection of vectors a subspace? First of all, WV must contain the zero vector (0, 0), so 3-0+4-0 = 0 
— this means W is only a subspace if a = 0. And now we Just need to check the closure properties, and those are 
pretty easy. For example, if we have a vector (v1, v2) € W and we multiply by A, then (Av1, Av2) is indeed in W, 
because 
3(Av1) + 4(Av2) = A(3V1 + 4v2) = 0 


by the assumption that (v1, v2) is in W. 

So once we have the concept of a subspace, we might ask whether we can understand a large, complicated vector 
space by Just understanding its subspaces? The answer is yes, and the main idea is to “break up the space” as much 
as possible. Our goal is to fill up the full vector space with a bunch of different subspaces, and this will become much 


more important when we talk about eigenvectors and eigenvalues later! 


Definition 40 


A vector space V is a direct sum of subspaces U;,--- , Um, denoted 


V=U,0U28---®Un, 


if any vector in v can be written uniquely as a sum Uy + Up +--+ + Um, where each u; € Uj. 


One way to think about this decomposition is that picking a vector in V is equivalent to picking a unique repre- 
sentative U,,--- ,Um from each of the subspaces U,,--- , Um. So one important property is that the vector spaces 


cannot overlap except at the zero vector: we must have 
U; A U; = {0} 


for all 7 AJ. It's worth thinking through why this is true, but the basic idea is that any vector that is in both U; and 
U; could be put in either one, and we would not have a unique way of writing down v as a sum uy +---+ Um. (Then 


we would have a sum, but not a direct sum.) 
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Example 41 


Let's look at the subspaces of V = R?. 


We have a two-dimensional subspace, and there’s no way to get a two-dimensional subspace of V other than the 
whole space itself — we can convince ourselves that we need the rest of the space if we have two vectors in different 
dimensions. So the only other subspaces are one-dimensional, which are just lines through the origin, and zero- 
dimensional, which is just the zero vector. Indeed, lines are subspaces, because adding two vectors on the line gives 
us another on the line, and so does scaling a vector by a real number. 

So now let's pick U; to be the “horizontal” axis and U> to be the “vertical” axis in R?: both of these are one- 
dimensional subspaces. And indeed, every vector in R? can be written uniquely as a sum of a horizontal part and a 
vertical part, so we have 

R? = U; @ Uo. 


But we can change things a little bit: instead of using the vertical axis U2, let's use the subspace of vectors (v1, v2) 
where v1 = vo, which is a “diagonal line’ U5. This is a little bit more complicated, but the parallelogram law tells us 


that we can indeed decompose any vector in IR? as a sum of a vector in U; and a vector in US, so we again have 
2 
R* =U, @ U5. 


In general, any two lines through the origin (that don't coincide) will direct sum to R?, but if we try to add ina 
third line, we will no longer get a unique representation of a vector in R?. So this is the first step to understanding 
the concept of dimension — we can’t have three lines that direct sum to R?, and indeed we're now going to figure out 


why we can call IR? two-dimensional. 


Remark 42. This kind of logic will be necessary for understanding dimensionality better in more complicated vector 
spaces. For example, the space of states for a particle in a central potential is infinite-dimensional, but we can break 


it down into easier-to-understand subspaces to talk about the evolution of the wavefunction! 


To understand dimension, we're going to introduce a few more concepts for rigor, so that we can also talk about 
infinite-dimensional vector spaces. Consider a list of vectors, which is just a list (vi,--- , Vn) (which must be of finite 


length) where v1,--- , Vv, € V. Then there's a few useful concepst we can extract: 


Definition 43 


The span of a list of vectors (V1,--- , Vp) is the set of linear combinations of the form 


4AVy + agV2 ++++ + agVp, af EF. 


A list spans the vector space if the span of the list is the whole vector space. 


This is basically the set of vectors that we can reach by taking some combination in our list. 


Definition 44 


A vector space V is finite-dimensional if it’s spanned by some list of vectors. (Otherwise, it is infinite- 


dimensional, which means no list spans the whole space. ) 


This definition has been made in a nice way so that we can work with it: 


24 


Proposition 45 


The space of polynomials is infinite-dimensional. 


Proof. Suppose otherwise; then there is a list of polynomials that spans our space. But because we have a finite list, 
there is some highest degree (perhaps 210090?) in all of our polynomials. And then we can't use a linear combination 


of our polynomials to get anything of higher degree (say 22900°°°) which is a contradiction. Thus the space must be 


infinite-dimensional. 


Proposition 46 


In contrast, our first example — the set of N-component vectors — is finite-dimensional. 


Proof. We just need to produce a list that spans the whole space: we Just use 


And then a general vector is just of the form aye, + a2@2 +--+: + anen. 


Definition 47 


A list of vectors (V1, V2,--: , Vp) is linearly independent if 


4,V, + aoV2 +--+ + anVp = 0 


only has the solution a; = a> 


In other words, if we want to represent the zero vector with our list, we have to set all of the coefficients to 0. 


Definition 48 


A basis of a vector space V is a list of linearly independent vectors which spans V. 


Basically, we need to have enough vectors to span all of V, but we shouldn’t have any “extra vectors” that just 
give us redundant information. 

It turns out that any finite dimensional vector space has a basis — this is easy to show — and also that any two 
bases have the same length. So the length is some quantity independent of our basis, and that’s what we'll call our 


dimension: 


Definition 49 


The dimension of a vector space V is the length of any basis of V. 


Remark 50. At the moment, we don’t have an inner product on our vector space yet: we're putting the least amount 


of structure that is necessary. There's a lot of properties that we can already extract without needing an inner product! 
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So returning to the example above where we created a list that spans our vector space, this list is also linearly 
independent: each entry needs to be 0, so all ajs are zero. Thus the space of \-component vectors has dimension 


N. Similarly, we can prove that the space of Mx N matrices has dimension MN. 


Example 51 


Let's find the dimensionality of the space of Hermitian matrices. 


We'll use the following list of four “vectors:” 
(i 01,09; 03). 


This is indeed a list of vectors in our space, because all four of these matrices are Hermitian. It also spans our space 
ctd a+tib 
a—ib c—d 
And to show that this list is linearly independent, we just set our matrix to 0: then we need c+ d =c—d=0, so 


by an argument we made earlier on — we can get with the linear combination c/ + ao, + boo + doz. 


c =d =O, and we also need a+ jbD=a—ib=0,s0 a= b=0. So indeed, we have a linearly independent list which 


also spans, and thus the vector space of Hermitian matrices has dimension 4. 


Remark 52. We can try proving that the space F™ of infinite sequences is infinite-dimensional: it requires a bit of 


work! 


We'll now move on to something else: we want to talk about linear maps from a vector space V to another vector 


space W. A special case of these is a linear operator, where we map from V to the vector space V again. 


Definition 53 


A linear operator T on a vector space V is a function T : V > V with the properties 


T(utv)=T(u)+T(v), T(av)=aT(v) 


for any u,v € V and aeEF. 


The quantum mechanical motivation for this is that observables are represented by operators: expectation 
values, symmetries, and unitary time-evolution all come from these linear operators. 

It’s important for us to realize that T(u) = Tu (both notations are okay) is a vector which is in the image of 
our linear operator. In some sense, we can think of T as doing some kind of multiplication — Tu is basically matrix 
multiplication on a vector! 

The key idea here is that we only need to know how the linear operator works on basis vectors, and that gives us 
everything we need to know about the operator! This is because every vector can be obtained by putting together 
a linear combination of the basis vectors, the “linearity” of the linear operator tells us the value of T(v) for all other 


vectors. 


Lemma 54 


We have T(0) = 0 for any linear operator T. 


Proof. We know that 
T(u) = T(u+0) = T(u) + T(0) 


for any vector u, and now subtracting T(u) from both sides yields T(0) = 0. (We're allowed to subtract because 


that’s basically adding the additive inverse!) 
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The best way to understand such operators is to give some examples, so that’s what we'll do now. 


Example 55 


Let V be the real vector space of polynomials on a real variable x — that is, vectors are real polynomials p(x). 


Define the derivative operator 7 which sends a polynomial p to its derivative: T(p) = p’. Also define the 


operator S which multiplies a polynomial by x: S(p) = xp. 


It’s indeed true that the derivative of a sum is the sum of the derivatives, and also that we can “take out” constants 


from a derivative. So T is indeed a linear operator! Similarly, we can check that S is linear by distributivity. 


Example 56 
Consider the vector space of infinite sequences F°° of the form (x1, Xo, x3---). Define the left shift operator L, 
which sends such a sequence to (Xo, x3, X4,---), and similarly define the right shift R which sends the sequence 


to (0, x1, X2,---). 


We should try writing out the properties: indeed, L and R are both linear operators if we check all of the properties. 
But it’s very important that we need to put the number 0 in the first spot for the right shift R, or else we wouldn't 


have a linear operator! 
Remark 57. This is tangentially related to the famous Hilbert's Hotel problem — doing a right shift opens up a room. 


L and R look a little bit like inverse operators, but they’re not quite the same: we lose information about x; when 
we do a left shift. 


Example 58 


These are the “trivial operators:” the zero operator sends everything to the zero vector, and the identity operator 


sends a vector to itself. 


The main idea here is that even though our linear operator sends V to itself, it doesn’t have to be one-to-one! 


And the zero operator is just an extreme case of this where everything is sent to the same vector in V. 


6 February 10, 2020 


Hopefully we're starting to get accustomed to the pace of this class — it’s a bit fast, especially at the beginning, 
depending on how much we remember from previous classes. So there's quite a bit that we need to remember about 
bound states and the Schrodinger equation and square wells, but some of us might have just not seen very much of 
this. We've had a few ideas introduced — spin 1/2, the variational principle, and now we have about a two-week mathy 
interlude where we're talking about linear algebra so that we can do physics precisely. 

Some people have said that in 8.05, there’s too much math. It’s a valid viewpoint, but that criticism sometimes 
comes from other faculty that already understand quantum mechanics — math just didn’t play much of a role in their 
understanding. Professor Zwiebach, though, felt that some parts of his understanding remained fuzzy until he had the 
mathematical formalism. So there might be a 10 percent excess of math, but it will be necessary to help us think about 
projectors, measurements, tensor products, and so on. And with the ideas of quantum information, linear algebra’s 


role has become even more important! 
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Fact 59 
Lectures 1 and 2 were never due — we have to do them, but there is no associated deadline. But lecture 3 questions 


were due today (at 9am) — if we were confused about due dates or had trouble, we should let the 8.051 team 


know. (And there’s lecture 4 questions due on Wednesday and pset 1 due on Friday.) 


Questions are graded for accuracy, but don't agonize over grading. These are supposed to help us understand 


the material. 


The homework has a lot of parts, so we should get started early on. The advantage of this homework is that it’s 
all based on 8.04 and last week’s material. On our first problem set, there's a variational problem where we have a 


quartic potential 
2 


H= 5 + ast, 
If the quartic were a quadratic term, it would be a simple harmonic oscillator and we'd know how to solve it. But with 
this new potential (a@ assumed to be positive, by the way), we get a strange oscillator which doesn't even have the usual 
harmonicity. With the harmonic oscillator, the energy levels are equally separated and multiples of the lowest energy, 
which tells us that frequencies are also multiples of each other. But adding a quartic term breaks the harmonicity — 
energy levels will not be multiples any more. 

Is that good or bad? It turns out that for quantum computation, harmonic oscillators are not good — |0) and |1) 
usually take up the lowest two energy levels, but the energy difference between those two Is the same as the difference 
to the next energy level! So the qubit will not be in the |0) or |1) state anymore, which is bad — that’s one motivation 
for why we study different potentials like in this homework problem. 


If the Hamiltonian had a harmonic oscillator term 


2 1 
a + art 4 5 MUP Re, 


H= 


the &* term would be a small correction to the harmonic oscillator potential, and then we calculate things using 
perturbation theory (this is an 8.06 idea). But for our original problem, we need to use variational methods — the 
differential equation can’t be solved analytically. We'll need to calculate some of the energy eigenstates numerically, 
and we do this with the shooting method. 
Specifically, we're trying to solve the equation 
he d? 
—-—— 54+ V(xX)V=EY, 

2m ae” “ad 
and we want to first get rid of the fh, m, a before plugging this into a computer (because the energies depend on those 
quantities in a predictable way). Once we do this, we integrate the differential equation from 0 to some distance, 
exxpressed in dimensionless units u — for example, we can integrate from O to 3.5 and that’s probably enough — and 


there are Mathematica instructions on how to do this in the pset PDF. 


Fact 60 


Energy eigenstates are normalizable, so we can tweak the parameters and look at when the wavefunction “blows 


up” or “blows down” and look in between those. 


Question 61. On Piazza, there was a question about the axioms of vector spaces — our lecture has the axiom lv = v 


(for all vectors v in the vector space), but Shankar doesn't have this. So what's going on here? 


Mathematical objects called fields have two special numbers, 0 and 1. They're the additive and multiplicative 
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identity, respectively: this means that a+ 0 = aand 1a=a for any a € F. With this, it’s also simple to show that 
0-a=0, so we don't need this as an axiom. 

Well, the statement lv = v Is completely different, because v is a vector, not an element of our field F! We 
can try to prove this statement from the other axioms, but it really doesn't work out — the best hope is to try using 
(ab)v = a(bv), and that isn't enough. So we need to include it in our set of linear algebra axioms. The main point 
here is that we need to keep vectors and scalars separate in our mind. 

We can, however, prove that 0- v = 0, where the O on the left side is a scalar and the 0 on the right side is the 
zero vector. So we can think about all of this a bit more if we like the mathematical formalism. 


Let’s do some examples with the variational principle: 


Example 62 


Consider two potentials Vj (x) and V(x) which satisfy Vo(x) < U(x) for all x. Can we show that the ground state 


energy for V> is always lower than the ground state energy for V,? 


The intuition we have is that “the energy is always higher for V1," and while this might seem clear, it gives us the 
following result. Consider an attractive potential V(x) which is bounded — specifically, it’s nowhere positive, piecewise 
continuous, asymptotically zero, and not zero everywhere. Then there's a famous result that V always has a bound 
state — this is true in one dimension but not in three dimensions! 

How do we show that? We can try to find the bound state, but if V is an arbitrary potential, this is very difficult. 
Instead, we can just “sandwich” a finite square well potential between V(x) and 0 — since the finite square well potential 
has a bound state, this example problem would tell us that the attractive potential V(x) has a bound state (with lower 
energy). 


Okay, so let’s try to show that the ground state energies satisfy ES° < E#°. We have our two Hamiltonians 


_ 


2m 


Pe 


H 
2m 


+Vi(x), He + V2(x), 


and consider the overlap 


(YA),  (W, How). 


Both of these are numbers — they're the inner products (1, w2) = J Witbedx — and specifically, because 


2 2 
(, Hyp) = (v a v) + [vive (v 2 v) + furve=, Hop) 


"2m "2m 


(after all, the boxed x terms cancel out, the left integral is [V4(x)||?, and the right integral is { Vo(x)|q|?), we 


have 


(w, Ai) 2 (p, How) |. 


So now we want to use this to look at the energy ground states: the variational principle says that 


ES? < (W, Ho) < (Ww, Hip). 


Remember that this holds for any wW, so we can now pick w to be the ground state wavefunction for Y,! And that 
tells us that 


E < (Ws. Hi Ws,1) = EY’, 


and we've indeed shown that the ground state energies are related in the way that we expect — the lower potential has 


a lower energy ground state. 
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In the remaining time, we'll talk a bit about the concept of a direct sum in linear algebra. Recall that a vector 
space V is the direct sum 
V=VU8V2 


if we can write any vector V uniquely as a vector in Vy, plus a vector in V>. One way we can think about this is that 
we're increasing the dimension of the vector space by adding more axes. In principle, the first vector space V, has 


some basis vectors, and V2 adds some more — the total dimension 
dimV = dimY, + dim Vo. 

(Soon, we'll consider the other operation V = V4 @ Vs, in which case the total dimension 
dim V = dim V, - dim V4.) 


One note — just because Vi @ Va = V, @ Vs, this doesn't necessarily mean that V2 = V3. (For example, if Vi is the 
x-axis, V2 can be the y-axis and V3 can be the line y = x, and both sides give us the xy-plane.) 

So as an example, the energy eigenstates provide a basis for the whole space in the simple harmonic oscillator. We 
have a vector |0), the ground state, and we have another vector |1) = a!|0), the first excited state — in general, we 


have 
(alk 


VkI 


defined in such a way that these are all orthonormal. Well, let Uo be the 1-dimensional vector space which is the span 


|k) = [0), 


of |0), let U; be the span of |1), and define Ux in general. Then what we're saying is that the whole state space 
H = Up ® Uy © U2-+- = Dx. 
k=0 


And this is true because the general wavefunction is a linear combination (unique superposition) of our energy eigen- 
states — we can write 
|) = aol) +aal1)+---. 


7 Linear Algebra — Vector Spaces and Operators, Part 2 


Now that we've begun to see a few properties and examples of linear operators, let's try to extract some structure out 


of the set of linear operators. 


Definition 63 


For a vector space V, let L(V) denote the set of linear operators on V. 


It turns out that this set actually forms a new vector space! This is because we can take two operators S, T € L(V) 


and define their sum S +7, which is the operator satisfying 
(S+T)v=Sv+Tv. 
Similarly, we define a scalar multiple of a linear operator by defining that 


(aS)v =a- (Sv). 
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With this definition, we already have all of the properties that we need in a vector space for our linear operators: we 


just need to check that S + T and aS are indeed linear operators by confirming statements of the form 
(S+T)(u+v)=(S+T)ut+(S4+T)v, 


but it all works out. In addition, we do have an additive operator — It’s the zero operator. So L(V) is a new vector 
space that we've created, and it’s over the same field F as our vector space V. 

Remember that in all of our definitions so far, there hasn't been an obvious way to multiply vectors together. We 
know that there’s a cross product in three dimensions for our ordinary vectors, but there is no such thing in two or 
four dimensions. On the other hand, these operators have a very natural multiplication: for two operators S and T, 
define their product via 

(ST)u = S(Tu). 


In other words, we let 7 act on our vector first, and then apply S to the result of that. This is a new structure we've 
added to our vector space! We now need to show that ST is indeed a linear operator — it's pretty simple to verify, but 
it's worth working it out on our own. And the point now is that we can multiply operators. 

There are now a few questions we can ask: is this multiplication commutative or associative, and is there an identity 


or inverse element? 


It turns out that associativity is true: this is because $(TU) = (ST)U holds for any three linear operators S, T, U, 


and in both cases we just apply U, then T, then S to our vector. 
¢ There is an identity element: it’s the identity operator, which sends every vector to itself. Call this operator /. 
+ Operators do not always have inverses (for example, consider the zero operator). 
+ Finally, operators are not always commutative (ST and TS are not always the same). 


This last point is pretty important for quantum mechanics, and it'll relate to the concept of a commutator, which 


measures the difference between the two operators AB and BA. Basically, we define the quantity 
[A, B] = AB — BA, 


and this can often have important physical implications. 


Let's study an illustrative example: 


Example 64 


Consider the two operators from before on our vector space of polynomials: T differentiates the polynomial, and 


S multiplies it by x. 


The product of T and S is some linear operator, which we can't figure out until we see how it acts on a polynomial. 
We can try to have it act on a general polynomial, but we don't need to do that: acting on the simple basis elements 
is enough. So let’s apply this on x”: 

TS STOO Seis". 


On the other hand, we can also look at the other way around: we evaluate the product ST to be 


ST(x") = S(nx""") = nx”. 
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Indeed, TS and ST are not the same, and the commutator [T, S] = TS — ST is an operator such that 
[T, S]x® = (TS — ST)x® = (n+ 1)x? — nx? = x". 


In other words, the commutator [T, S] is actually the identity operator /, because we can repeat this argument for any 
x" And this commutation relation has to do with the usual quantum mechanics commutation relation between X and 
6, which are indeed multiplication by x and an x-derivative (up to constant factors). 

So in summary, we've now put an extra structure on our linear operators. As a nice exercise, we can try computing 
the commutator [L, R] between the left and right shifts on our interesting sequences. 

With that out of the way, we'll now move on to some more linear algebra: we're going to extract a few more basic 
properties out of our operators. There’s always a few basic questions to ask when we encounter a new object, and in 
this case knowing these answers tells us a lot about the linear operator! 

There are basically two ways to characterize our linear operators T € L(V), related to injectivity and surjectivity, 


respectively. 


Definition 65 


The null space of a linear operator T is the set of vectors v € V such that T(v) =0. 


These are the objects that are being “nullified” by 7, and it turns out this set of vectors is a subspace of V! 
So there's a bit more here than just having a set of vectors. Indeed, we can check that if u,v € null(7), then 


u+v €null(T) as well, and so is au. This is closely related to the next definition here: 


Definition 66 


A linear operator T is injective or one-to-one if different vectors end up in different places under T: that is, if 


T(u) = T(v), then u= v. 


But there might be a more useful way of representing this idea: “one-to-one” might not represent injectivity very 
well, because of course T takes in a vector and outputs another vector. So Sean Carroll, a professor at Caltech, has 


suggested using the word “two-to-two” instead. Indeed, another way to phrase the above idea is that 


uAzAV T(u) A T(v). 
So let's take the two definitions we've just made and link them together: 


Theorem 67 


A linear operator T is injective if and only if null(7) = {0}. 


We know that the null space always contains the zero vector, and this theorem says that injectivity forces that to 


be the entire null space! 


Proof. We need to show both directions here. If T is injective, then for any vector u in the null space, T(u) = T(0) 
implies that u = 0. And thus the only vector that can be in the null space is 0, as desired. 


On the other hand, suppose that we know that the null space is just the zero vector. Then 


Tv) S00 == @-y=0, 
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because u — v is in the null space. But now T(u— v) = T(u) — T(v), so we can restate the above statement as 


T(u) = T(v) u=V, 


which is indeed the definition of injectivity. 


With this, we've now related the words “null space” and “injective,” and it’s clear how the two are related. 


Definition 68 


The range of a linear operator T is the set of vectors of the form Tv, where v € V. 


Basically, we try applying 7 to everything, and we see which vectors we get. This set of vectors again has some 
additional structure: it is also a subspace of V! This requires a little bit of thinking: if two vectors u’, v’ are in the 
range of T, then there are vectors u, v such that T(u) = u! and T(v) =v’, so T(u+v) =u'+4+ Vv’. Thus the sum of 


two vectors in the range of T is also in the range of 7, and the scalar multiplication closure follows similarly. 


Definition 69 


A linear operator is surjective if range T =V. 


In other words, our operator reaches the whole vector space. 


Example 70 


Consider the left and right shift operators from earlier in the class, defined via 


L(x, Xo,+++) = (Xo, X3,-°+), R(x1, Xo,0°° = (0,x,,°°° ‘ 


Let's try to extract a few of the elementary properties of these operators. 


First of all, what is the nullspace of L? We need the final vector to be 0, which means that x», x3,--- must all be 
0, but x, can be anything: 
null(L) = (x,,0,0---). 


In other words, L is not injective, because the null space is nonzero. (For instance, both (1,0,0,---) and (3,0, 0,---) 
are sent to the same thing.) However, L is surjective, because we can get any vector (a, b,---) by starting with 
(0, a, b,---). 

Similarly, we can find the null space for R: it is just the zero vector, because we need all of x1, xXo,--- to be zero. 
In other words, R is injective. However, R is not surjective: it’s not possible to end up with the vector (1, 0,0,---). 
In other words, the range of R is smaller than V. 

This might seem reasonable, but it is actually pretty counterintuitive: in fintte-dimensional vector spaces, we won't 
have a situation like this where the operator is surjective but not injective, or injective but not surjective! 

Now that we've introduced a lot of these ideas, we're going to introduce a fundamental result which we'll come 
close to proving. The key idea is that the null space null 7 and the range range T are both vector spaces (subspaces 


of V), so they have a dimension. 


Theorem 71 (Rank-nullity theorem) 


For any linear operator T on a finite-dimensional vector space V, 


dim(null 7) + dim(range 7) = dim V. 
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This theorem actually turns out to hold in a more general sense too: this still holds if we have a linear map that 
goes from V into a different vector space W! So this is a powerful result, and it’s important to keep in mind. 

The key idea for proving this theorem is to think about the basis vectors at play here. Since null T is a vector 
subspace, there is a basis (uy, U2,--: ,Um) for that null space. But this isn’t the full basis — the null space is often 
much smaller than V — it’s just a linearly independent set in V. But now we can extend this to a basis for the whole 
vector space by adding some elements vj, Vo,--- , Vp. It now suffices to show that T v1, Tvo,--- , TV, actually form a 
basis for range T: then we'd know that dim(null 7) is m, dim(range T) is n, and dimV is (m+n). 


Instead of going through the whole proof there, we'll just do a simple case for illustration. 


Example 72 


2 q i) al : : 
Consider the linear operator T = , ; on a two-dimensional vector space V = R?. 


To find the null space, we just need to find the set of vectors that are killed by T: we need 


P dll fd 


to be the zero vector, so we need b to be zero (but there are no restrictions on a). Thus the null space is the set of 


vectors spanned by e; = 


On the other hand, the range of T is the set of vectors that come out of applying T: it’s the same boxed expression 
above, so it turns out that the range of T is actually spanned by the same vector e,. That might seem a bit confusing, 
but remember that the rank-nullity theorem above tells us that some of our basis vectors of V should form a basis for 
the null space (in this case just e,), and 7 applied on the rest of them (in this case just e2) should form a basis for 
the range. And indeed, Te = e; is a basis for the range, and we've verified that the rank-nullity theorem holds in this 
specific case. 

With this, we're now ready to move on to something more concrete: the matrix representation of our linear 
operators. This is an interesting phrase — one important idea is that our linear operators already exist, independent 
of whether or not we’re representing them. One analogy is that we can take a picture of some object which already 
exists, which makes it easier to describe and work with. But we can also take pictures from different angles to get 
different pictures, and that corresponds to different matrix representations of the same operator. So mathematicians 


don't always like matrix representations, but they're practically very helpful. 


Fact 73 


We need to choose a basis in a vector space before we can construct a matrix representation. And our result 


does depend on the basis that we pick. 


We'll abuse some notation here and often say that our linear operators T are “equal” to some matrix. But this is 
just a warning that whenever we do this, we should make sure we understand what basis we're using! 

What's perhaps most confusing about matrices when they are first introduced in a math class is that while we 
do add matrices component by component, we don't do the same for multiplication. So one of the goals of today 
is to show why the weird, complicated expression for matrix multiplication (taking the ‘th row and jth column and 


multiplying component-wise there) makes sense! 
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So we'll start by considering some basis {v} to be a basis for our vector space V. We know that for any vector 
vj in our basis, Tv; must still live in our vector space, so we can write it as some linear combination of the basis 


vectors: 


Tyj}= Tayi + Tajo +--+ + Tim Vm = peer 
i 


These numbers 7;; are going to be the numbers that go into our matrix representation for 7, because they carry all 


of the information about our operator: we just need to know where all of our basis vectors go! So we'll write that 


Tit Tio s+) Tin 

To1 Tog +++ Ton 
i , 

Tut Tm2 -*: Tn 


where Tj; goes in the /th row and jth column of our matrix. The whole point of this is that knowing the basis and 
knowing the operator will give us the matrix representation, and this might clear up some confusion: we do not need 


to know anything about a dual basis or bras and kets to define a matrix representation for T! 


Remark 74. /f we want to mention that these matrix entries depend on our basis, we may denote the entries as 
Ti({v}). 


Let’s use this to start understanding matrix multiplication: first of all, consider some vector 


ay 


a2 


v= ) ajyVj = 
i 


an 


Suppose we have a linear operator T: this will send our vector v to some other vector 
b=Tv= Ty aiVj- 
i 


Since T is a linear operator, we can break this up into each of the individual parts: thus 


b= ye a(Tvj) = S- aj Se TpiVp. 


i i 


where in the last equality we've written out the expression for Tv;. And now we can just switch the order of summation 


b= S- S- T pidiVp- 
Pp i 


So now we've figured out a way to write down b as a linear combination of the basis vectors! So the coefficients must 


bp _ 3S T pidi, 
i 


and now we've derived the familiar expression for multiplication of a matrix by a vector. 


so that this looks more familiar to us: 


satisfy 
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Fact 75 


By the way, note that the identity operator has a nice matrix representation: because /v; = v; for any basis 


vector Tj; is 1 when / = / and 0 otherwise, and thus the operator is a diagonal matrix with 1s on the diagonal and 


Os everywhere else. And the zero operator has a simple matrix representation as well: it Just has zeros everywhere. 


So now we're ready for the more general case: multiplying two matrices together. Suppose we have two linear 
operators T, S acting on the basis vector v;. We can start writing this out: since TS is an operator, we know that (by 
definition) 

DU(TS) yj = (TS) piv. 


p 
Our goal is to show that this entry (TS),; can be written in the usual matrix-multiplication form. To do that, note 
that 
TSy; = TSO SKjVk: 
k 


where we've done the action of S on vj, and now we can bring the T inside the sum by linearity: this is thus equal to 
s Sig T Vx = x Sj D> T pk Vp- 
k k p 
We now use the same trick as before: swap the order of summation, and we find that 


TSy; = Sec S- T pk Skj 
k 


p 


S 


D+ 


(We flipped the order of Tp, and S,;, which is fine because they're both numbers.) But the boxed expression here 


serves the same purpose as (7 S),;, and thus we have a formula for the entries of TS: 
(TS) pj = Do Tok Sui. 
k 


And this is indeed matrix multiplication: we're looking at the pth row of T and the jth row of S and multiplying 
component-wise there! And notice that we've now given a natural explanation, using linear algebra, of why matrix 
multiplication is defined the way it is: this is the only way to make sure the operator TS is consistent with applying 
S, then T. 

Our last topic for this lecture will be that of a change of basis. We said earlier that matrices provide a representation 
of linear operators on a vector space, but we may want to pick different bases in different scenarios, which lead to 
different matrices — thus, we need a way of converting between the bases. And in this study, we'll find that there is 
some information in our matrix that is independent of the basis that we choose! 


As we Said earlier, we have 


Ty = Tuy, 


where this result depends on the basis {v} we've chosen for the situation. So now we'll have two different bases: 
{v} = (v1,-°-+ , Vp) and {u} = (u1,--+ , Un), and now we need to define some new operators. We'll let A take {v} to 


{u}, and we'll let B take {u} to {v}: we can write this as 
Uk = AV, Ve = Bur 


for all 1 < k <n. Here A and B are linear operators — for example, they take the third basis vector in {v} to the third 
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basis vector in {u}, and vice versa. So now we can calculate 
BAv, = Bu, = Vk, 


and similarly 
ABu, = AVvk = Uk, 


so BA and AB are the identity operators, which means that A and B are in fact inverses of each other. 
The point of introducing these operators A and B to see how we can relate the matrix elements 7);({v}) and 
Tij({u}) — that is, how can we calculate the entries of matrices in one basis, given the entries in the other basis? 
First of all, notice that we've defined our basis-changing linear operators A and B, and we might want to write 
down matrix representations for them. But we have two bases — which one should we use for the representations? 


Wonderfully, it doesn’t actually matter: 


Proposition 76 


The matrix representations of A, B (our basis-changing operators) are the same in {v} and in {u}. 


Proof. We know that 


AV = a Aix ({V}) vi 


and 


Aug = S- Aix ({u}) ui 


by definition, and we want to show that these coefficients A;, are the same. To do this, note that 


Au, = A(Avx) = Ay Aix({v}) vi 


i] 


(by plugging in the definition of Av, from earlier), and now Ajx({v}) are just numbers, so we can bring the A inside 


the sum to get 


Aun = S7 Aix({v})(Avi) =| 50 Aik({v})ui} 


But looking at the two boxed expressions for Aux, they are identical except for the basis that we're using, so we indeed 


have that Aj, is the same in both bases, as desired. (The same argument works for B.) 


In the same spirit, we know that A and B are inverses of each other, so we know that 
BAe tne Y AnBie 
J J 


And in this kind of statement, we again don't need to write down the basis that we're using! 
So now we're ready to answer the main question. We'll find the matrix entries T({u}) in terms of T({v}) and the 
matrix A (we could also use the entry B). To unclutter notation, we'll use the repeated index convention (where a 


repeated index means we sum over that index). We'll need to do a bit of computation: we have the sum 


Tux =| Tik({u}) uj 


by definition, and we need to involve the v-vectors somehow: replacing Ux = Av, on the left hand side, we note that 
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this expression Is also equal to TAv,. And now letting A act on vx, we find that 
= TAv, = TARY; 


and now we can have the operator T act on the vjs to find 


=| AjxTpj({v}) Vp |. 


This isn't quite what we want, though — we have v vectors in the second expression, so we need to rewrite this in 
terms of the u vectors. And because vp = Bjpu; (rewriting so that we match up the indices on the u vectors), this 


means that 
Ti ({uj)uj = Aja Tp ({V}) Bipui, 
and finally using the fact that B is the inverse matrix of A, we arrive at our result (also reshuffling our numbers a bit): 


Tik({u})uj = Aig Lv} Aja Toei. 


But this means that the matrix T({u}) in the u-basis is just A7'T({v})A, since all of the /k-entries line up! And thus 
we've arrived at our main result: 


T({u}) =A TT ({v})A}, 


where A is the basechange matrix such that ux = Avg. (And this operation of multiplying with the inverse on the left 
and the matrix on the right is called conjugation: it'll come up again.) 


With this, we can prove some interesting properties about invariant properties of our linear operators. 


Proposition 77 


The trace of a matrix representing a linear operator, which is define to be the sum of the diagonal entries of that 


matrix, is basis-independent. 


Proof. To show this, we need to know a few important properties of trace: in particular, we have 
tr(T1T2) = tr(ToT1), 


and more generally the trace Is actually cyclic — we can show by computing some coefficients of matrix multiplication 
that 
tr(T1T2--+ Th) = tr(TnT1 +--+ Tr-1). 


So we can apply this to our base change formula above: 


tr(T{u}) =tr(A + T{v}A) = tr(AA!T{v}) =tr(T{v}), 


where we've used cyclicity in the middle equality. 


Proposition 78 


The determinant of a matrix representating a linear operator is basis-independent. 


Proof. For this, we also need to remember the property that 


det(AB) = det Adet B 
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for two matrices A and B. (In particular, this shows that det Adet A-+ = det / = 1.) This generalizes easily to show 
that 
det(A,--- Am) = det Ay --- det Am, 


and thus we can plug this into our base change formula again: 


det T{u} = det(A-*T{v}A) = det A! det T{v} det A = det T{v}, 


as desired. 


We'll see soon that the trace and determinant, along with a few other invariants, carry intrinsic information about 


our operators! 


8 February 12, 2020 


We're doing a bit more linear algebra now — we'll practice some of the ideas about matrix representation and linear 
operators today during class. (As a reminder, we also have a homework assignment due on Friday.) 

Last time, we talked about direct sums of vector spaces, and we'll illustrate that idea again with the central 
potentials. This is a typical situation in three dimensions, where the potential only depends on the magnitude of the 
position vector: 

V(F)=V(r), r=|F. 


This means we have a spherical symmetry — the most famous example is the hydrogen atom, but the spherical square 
well and Morse potential are key ideas in physics as well. These have all been studied, and the way they work out Is 
that the basic separable solutions follow the ansatz 


Uge (Tr) 
fr 


Weem(P) = - Y¢m(9, ). 


This is called a basic solution, and it’s indexed by E, 2, m. Here, the function u satisfies the Schrodinger equation 


lg hi 
namo 4 (sate + 1)4 vi) u=Eu 


dr2 


(basically, we have an effective potential, which serves as a centrifugal barrier). This is now a one-dimensional problem, 
and that’s the advantage of working in this system! There's only one confusion we should be careful about — the indexing 
by m is the quantum number, not the mass. And this holds for any kind of central potential — the general solution is 


going to be a superposition of these energy eigenstates. 


Remark 79. Where does the centrifugal term come from? The actual Schrodinger equation we want to solve in 


general has a Laplacian operator 
f2 
—-—V*ptVu = EY, 
sob + VY = EY 
and if we expand out the Laplacian in spherical coordinates, the angular part act on the Ygms in a particularly nice way. 


By the way, what are our bounds for E,2,m? £ can be any nonnegative integer 0,1,---, and for any fixed 2, m 
is an integer between —£ and &. And to find the energy, we need to solve the wave equation — this will quantize the 
allowed energy states, and it will tell us the indices for E. 

One more comment — the reason we have a 4 term is that Ue satisfies a nicer wave equation than we) A nice 
bonus is that when we're trying to normalize our wavefunction, the r? in the denominator from ||? cancels out with 


the r2dr term from the spherical d?x, so the wavefunction almost “normalizes itself!” 
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So let's go back to the linear algebra here — we were looking at direct sums. If we fix different values of 2, we get 
slightly different wave equations, and they each give slightly different allowed (quantized) values of E. We know that 
the energy levels will be higher for larger values of 2 (because the effective potential is larger), but the point is that if 
we draw a diagram with £ on one axis and E on the other axis, we get the spectrum of all energy eigenstates. 

Sometimes, different values of 2 will give the same value of E, and those give us degeneracies which need to be 


explained. But regardless, if we want to represent our total state space, we can write It as 


H = He=0 © Her © = PH, 
@=0 


which means that the whole set of basis vectors (energy eigenstates) can be decomposed into those from £ = 0, those 
from £= 1, and so on, and they're all linearly independent. But we can further decompose each Hy, into its different 


energy eigenstates: 


He = Hee, PHee, P--- = PD Hee,- 
k=0 


This means that we can write the whole energy space 


H=DOQ te, 


£=0 k=0 


but we're still not done breaking everything up: for example, 2 = 1 allows the quantum number m to be —1,0, or 1. 
So really, 
£ 
Hee, = ‘cP He,E,,m: 
m=—£ 
and now H¢,e,,m generates a one-dimensional vector-space — it’s just one basis, and if we substitute this back into our 


equation for H, we've completely decomposed our state space 
£ 


u-O®O @ Hecam 


co 
£=0 k=0 m=— 


One question: what if we have an energy degeneracy, so two different eigenstates (with different numbers £,m) have 
the same energy E? That doesn't matter: the H¢,c,m, and Hg,,£,m, States are still linearly independent, because they're 
different vectors. It’s just important to remember that degenerate states do differ in some way — otherwise, they'd be 
indistinguishable from each other! 

We'll now move on to talking about Pauli matrices: we'll probably have them memorized by the end of 8.051 with 


all of the exercises we're doing with them. We have the universal conventions 


Note that these matrices have a few properties: 


+ Pauli matrices are Hermitian, which means they are equal to their (complex) conjugate transpose. In fact, 
Ox, Oy, Oz, and the identity matrix / form an R-basis for the set of 2 x 2 Hermitian matrices (it’s good to think 
of these as an R-vector space, because multiplying Hermitian matrices by real numbers still give us Hermitian 


matrices, but this is not true for complex numbers). To show that this is true, note that the most general 2 x 2 
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Hermitian matrix can be written as 


ag+agz at agi 
= ap! + a0, + andy + agdz. 


aj—al aj— ag 


(The diagonal entries should be real, and the off-diagonal entries should be complex conjugates.) We often refer 
to Ox, 0y,0z aS 01, 02,03, respectively. By the way, the reason we write the diagonal entries as ap + a3 and 
ao — a3 instead of Just using two numbers bo and b3, because we want to write everything as a linear combination 


of Hermitian matrices. 


It's important to re-emphasize that this is a real vector space of dimension 4, even though the matrices are full 


of imaginary numbers! That's why we don’t say that the “vectors” (matrices in this case) are real or complex. 


+ Pauli matrices are traceless — the trace is the sum of the diagonal entries. They all have determinant —1 as well. 
Why is it important that these matrices are traceless? we should think of Pauli matrices as spin-1/2 operators, 
and they have two kinds of states (spin up and spin down). This means we should have two eigenvalues, and 


the trace (which is also the sum of the eigenvalues) being zero tells us that the eigenvalues are +A and —X. 


What happens when we square the Pauli matrices? It turns out that ors = /, which is a fundamental property as 
well. The importance of this property is that if we have a matrix equation, the eigenvalues satisfy the same 


equation: this tells us that A? = 1. 


Finally, the Pauli matrices are unitary, which means that UTU = /. There are more pictorial properties of this, 


but this is a good one to start with. Indeed, we can check that 
ala; =ojo; =! 


(because ao; are Hermitian, a! = 0j). 


One fundamental property of the Pauli matrices that we should internalize is that the product 


Oj; = Oil + LE jKOk |, 


where the repeated index k on the right-hand side is summed. 
1 0 
So let's think about an operator O acting on spin states |+) and |—): we can represent them as A and Ht 


respectively. Suppose the operator satisfies 


1 a 
o| | =H) =al4) +6 »=|2] 

and 
o/°| =o) =714) +6] =| 


Because |+) and |—) are the basis vectors and © is a linear operator, this tells us everything we want about the 
operator. But perhaps we want a formula that tells us how to act on every vector in our vector space: that’s where the 
matrices come In. It’s dangerous, though, to say explicitly what the operator does to every vector, because we need to 
check linearity! For example, we can prove that no operator exists which reverses every single spin state, because 
that would violate linearity. So the representation of an operator should generally just be on our basis vectors. 


The best way to write O as a matrix is to just look at the two equations above and see what we need to satisfy: 
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we'll find that we basically put the images of the basis vectors down as our column vectors: 


=p a] 


As a challenge for next time, we can try to write down a matrix representation for the operator 


|; +) (7; +I. 


9 Linear Algebra — Vector Spaces and Operators, Part 3 


We started talking about linear operators on vector spaces last time, and the main things we discussed were its rough 
features — for instance, we've discovered a few things about the operator's null space and range. We now want to ask 
a few more questions, primarily centered around eigenvalues and eigenvectors. 

The physics motivation here is that our operators are observables, and eigenvalues tell us a specific possible value 
of a measurement of that observable. Since these are crucial properties in quantum mechanics, we should make 
sure we understand the mathematics here too. 


As before, we'll be working with linear operators T € £(V) on a vector space V. 


Definition 80 


A subspace U of a vector space V is an invariant subspace if T(u) € U for all vectors u € U. (Another way to 


write this is that the set T(U) C U.) 


The idea here is that applying T keeps us inside a subspace, so we have a more degenerate (but still interesting) 
representation of our linear operator. We should remember that being an invariant subspace is an idea connected to 


an operator — it doesn’t just exist on Its own! 


Example 81 


We always have two trivial examples of invariant subspaces: the zero vector is an invariant subspace (because it’s 


sent to itself), and the whole vector space is also invariant (because 7 takes a vector space to itself). 


These aren't very interesting, so let's try to construct an example of a more interesting invariant subspace. We 
know that the zero vector is a subspace of dimension 1, while the whole vector space V has full dimension dim V: let's 
try to get something in between by considering one-dimensional invariant subspaces. 

Every one-dimensional subspace can be generated by a single vector u € V: the space generated by this vector is 
the set U = {cu : c € F} (we can scale the vector by any number, so we have a line through the origin). Because the 


basis has one vector, this does indeed have dimension 1 by definition. 


Fact 82 


It's important that u is not the zero vector: otherwise, we won't actually generate a one-dimensional subspace. 


So if we want U to be invariant, that means that 


Tueu = |Tu=aAu 
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for some number A € F. It turns out this is an extremely important equation: such vectors are invariant up to a scalar 


factor under the action of T. 


Definition 83 


Let T be a linear operator on an F-vector space V. If there is a vector u € V and a scalar A € F such that 


TS, Us= 0), 


then A is an eigenvalue of 7, and u Is its associated eigenvector. 


The reason we don't allow u = 0 is because that’s satisfied for any A: it’s not an interesting equation. So 
eigenvalues need to correspond to nonzero vectors, though it is okay for eigenvalues themselves to be zero. In 
fact, 0 is often an interesting eigenvalue — that means that we have some nonzero vector u which is killed by 7, and 
that tells us something about the null space. 

So suppose we have such an eigenvector u: then every vector in the span of u, which is the set of vectors of the 
form cu (for c € F), is an eigenvector. After all, if Tu = Au, it’s okay for us to multiply both sides of the equation by 
any constant c. That means that we'll often say that “the span of u is an eigenvector” (even though we don’t actually 
want to include the zero vector) 

Sometimes we'll actually get a funny situation: we might have a particular value of X for which more than one 
independent vector solves the equation Tu = Au. Then we have a degeneracy — where a given eigenvalue has more 
than one eigenvector — and then our invariant subspace is larger than one dimension! For example, if u; and U2 both 
have eigenvalue A, then every vector in the span of uy and us will also have eigenvalue A, and then some interesting 


complications will occur. But there’s a lot of physics associated with this idea, so we should keep it in mind. 


Definition 84 


The spectrum of an operator is its set of eigenvalues. 


We will want to find a way to solve for the eigenvalues >: notice that we can rewrite the equation 
Tu=Au = (T-Al)u=0. 


So the eigenvalue condition actually tells us that there is a nonzero vector u that is killed by the operator T — Al, 
and in particular this means that T — A/ is not injective — its null space is not just the zero vector. And when we 
have a finite-dimensional vector space, this actually means that T — A/ Is also not surjective and not invertible. And 
the eigenvectors of T correspond exactly to the null space of 7 — X/. (And this explains that since we want the null 


space to always include the zero vector, it’s convenient to just include it as a “soft” eigenvector.) 


Example 85 


When is A = 0 an eigenvalue of an operator T? 


The eigenvectors of eigenvalue O are those vectors for which Tu = Ou = 0: thus, the null space of T is the 


eigenvectors of eigenvalue 0. 


Example 86 


We've been talking about properties of matrix representations that are basis-independent: are the eigenvalues 


and eigenvectors basis-independent? Are invariant subspaces basis-independent? 
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All of these concepts do not need to be defined by a basis, so they are indeed basis-independent. However, we 
should be careful to note that eigenvectors might be represented differently in different bases. 

To summarize this first pass through eigenvectors and eigenvalues, we got to this idea by exploring invariant 
subspaces, particularly those with one dimension. Then the eigenvector spans our invariant subspace, and the 
eigenvalue tells us how the subspace behaves under the action of our operator T: everything Is just scaled. And when 


we're working in complex vector spaces, knowing all of the eigenvalues and eigenvectors tells us a lot of information! 
Remark 87. The next few pages are optional material. 


Let's try to gain some intuition for what’s happening geometrically with eigenvalues and eigenvectors: 


Example 88 


Consider an operator in V = R? which rotates vectors: explicitly, consider a rotation around the z-axis. What are 


the eigenvalues and eigenvectors of this operator? 


We know that all vectors that are not along the z-axis will be rotated (so they end up in a different direction), but 
all vectors along the z-axis are left invariant. So in R°, there is only one eigenvector: the one along the z-direction 
(or more precisely, the span of (0,0,1)). This eigenvector has X = 1, because there is no scaling on the z-axis. 

But are there other eigenvectors? The answer is no, as long as we're working in the real numbers. If we try 
calculating eigenvectors and eigenvalues mathematically here, we'll find that the other potential vectors end up with 
complex coefficients: those aren't allowed when we have a real vector space! Similarly, if we use the example V = R? 
and do a rotation T in the plane, we will actually have no invariant vectors and thus no eigenvectors at all! So real 
vector spaces have this kind of complication, and that’s a reason why we like complex vector spaces better. We'll 
get some better results — there’s always at least one eigenvalue and often many more if our operators are nice — and 
if we restrict ourselves to a special class of operators, the eigenvalues will be real, which is what we need for physical 
observables to make sense! 

Here's an important piece of intuition: eigenvectors of different eigenvalues are linearly independent, and we'll 
actually make a stronger statement soon: they'll be orthogonal once we define an inner product. But let's prove what 


we can for now: 


Theorem 89 


Let T € L(V) be a linear operator, and let A1,--- , An be distinct eigenvalues with corresponding (nonzero) 


eigenvectors Uy,--- , Uy. Then the eigenvectors are linearly independent. 


The reason we care about this is that we often want our eigenvectors to span our vector space, which can be a 
useful thing to have! Again, note that we can’t define orthogonality yet because we don’t have an inner product on 


our space yet. 


Remark 90. Sometimes, it’s possible that a given eigenvalue » has more than one eigenvector. Then we can pick any 
of those eigenvectors to put in our theorem here. Also, since a dimension n vector space can have at most n linearly 


independent vectors, this means that T can have at most n different eigenvalues. 


Proof. We'll work by contradiction. Assume that uy,--- , Up, are linearly dependent: then there exists some smallest 
k < nsuch that 


Uk = ayy + +++ + AK—-1Uk-1- 
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Basically, follow the following procedure. We look at the first vector — Is it linearly independent by itself? Yes. Then 
look at the first two vectors together — are they linearly independent? If not, then set k = 2 above, otherwise look 
at the first three vectors together and continue on. And in the first place we stop, the contribution from ux, must be 
nonzero (another way to say this is that ux is in the span of u, through ux_1), and so the equation above is indeed 
valid. 

But now u, is nonzero by definition, so some of the a,s must be nonzero. Now apply the operator T — A;/ to this 
equation: since Ax is the eigenvalue of ux, the left hand side becomes 0. We can also simplify the right hand side: 


T — Axl doesn’t kill any of the other ujs, because they all have distinct eigenvalues. Specifically, we'll get the equation 


O = ay(A1 — Ax)Ua + a2(Az — Aq)U2 ++ + AK—1(Ak—1 — Ak )UK-1 = O. 


But remember that uz, through ux—1 were assumed to be linearly independent, so for this equation to hold, we must 


have all coefficients be zero, which means ay = a2 = +--+: = ax_; = 0. And this Is a contradiction with the defining 


equation for u, above! Thus the eigenvectors must be linearly independent, as desired. 


This is a pretty unorthodox proof, but we can now connect this with more standard discussions. We said earlier 
that 
Tu=dAu => T —Al is not invertible. 


And a useful way to restate this is that 


det(T — X/) =0 


(here we're using a fact from linear algebra that an operator that is not invertible has determinant 0). And the best 
way to work with such a statement is to find a matrix representation and calculate the determinant explicitly. That 


basically looks something like 


Ti —A T12 vee Tin 
Tot To2 -~A + Ton 
det ; ; ; ; = 0. 
Tt Tne “+ Tuy — A 


Basically, we put —As on the diagonal, and then the determinant f(A) will be an nth degree polynomial in . So our 


defining equation will look something like 
(=X)% by gl tt by = 0. 


Such equations will have solutions over the complex numbers C, and this is why we like to work with complex vector 
spaces — we can guarantee that this has at least one solution! And most of the time, we will indeed have NV solutions, 


but occasionally there are repeats: we can always factor this polynomial, called the characteristic polynomial, as 


F(A) = (-1)%(A= Ar) (A = Aa) (A= Aw), 


where it’s possible that A; and Aj are the same. (That would correspond to a degeneracy. ) 


Definition 91 


If all eigenvalues of an operator are distinct, then we say that the spectrum is non-degenerate. On the other 


hand, if the characteristic polynomial has (A — 4;) appearing k; times, then »; is a degenerate eigenvalue with 


multiplicity k;. 
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And now we have a typical strategy for calculating eigenvalues and eigenvectors: use this equation det(T — A/) = 0 
to find solutions for A, and then we will definitely have corresponding eigenvectors (because the operator is not 
injective). But to make any more progress at this point, we're going to need to introduce some additional structure. 

In particular, the next question that we might be asked about our vectors in our vector space is their length. And 
the object that we'll now introduce — the inner product — helps us talk about lengths, but it also helps us define lots of 
other things that are coming up in the next few lectures — Hermitian, unitary, and orthogonal operators, among many 
other things. We'll start by talking about this for real vector spaces R”, because that’s what might be more familiar. 


In this vector space, we'll denote vectors in the form 
a= (a, 4,:-:,an), a ER. 


Recall from geometry that the length of a vector is 


|a| = fRt Bt +a 
This motivates the definition of a dot product: perhaps we want 
la? =a-a=att+ast---+ a2. 


So this length squared is now some kind of operation of a with itself, so we can generalize this a bit: 


Definition 92 


The (real) dot product between two vectors a, b € IR” is the number 


a-b=aybhy t ao bo faeioeogs 


This is a nice definition, because we can explicitly calculate it and also discover some properties. Our goal will be to 
do this more axiomatically: find some properties that an inner product should satisfy so that it gives us the appropriate 
structure. This way, we know that all inner products, no matter how they're constructed, will have certain desirable 
properties. 


Let's go ahead and state those axioms for a real vector space now: 

1. For any vector a, a- a> 0. (Then it’s well-defined to say that we have a length defined by a- a = |a|?.) 
2. If a-a=O0, then a=0O. This means that the only vector with zero length is the zero vector. 

3. We have distributivity: a(b +c) = ab+ ac for vectors a, b, c. 

4. a- (ab) = a(a- b) for vectors a, b and real numbers a. 

5. a-b=b-a. 


The second property is very important — it will help us with maybe half of the proofs that we'll be doing in this 
class related to operators! Indeed, it’s true for the dot product we've already defined: the only way for a sum of real 
squares to be zero Is If they’re all zero. 


These dot products do not uniquely determine the definition of a dot product: we can actually define 
a-b=ca,b, + Gaobo +--+ + Cpanbn, 


as long as all of the cjs are positive. We can check that It does indeed satisfy all of the above properties. 
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Remark 93. /t is not okay for the cjs to be negative or zero, though — this violates one of the axioms, and it's good 


for us to figure out which one that Is. 


Theorem 94 (Schwarz inequality) 


For any two vectors a, b, we have 


This is not an obvious result, and we need to understand what it says. Note that the bars on the right side give us 
the lengths of the vectors a and b, but the bars on the left side give us the absolute value of the number a- b. So 
bars can mean different things — we should be careful! 

One way we might have seen this explained is that a-b = |a||b| cos @, where @ is the angle between the two vectors. 
And we know that cos has magnitude at most 1, so indeed the magnitude of this dot product is less than |a||b]. But 
this isn't a proof, because that’s not how we defined the dot product above! So let’s do a more rigorous proof: note 
that the Schwarz inequality actually follows from our inner product axioms above, so it doesn’t depend on the specific 


inner product that we're using. 


Proof. We use the axiom that a- a> 0 for any vector a. Consider the orthogonal projection from a onto b: that is, 
split up the vector into components 


a=aj +a, 


such that aj is along the direction of b and a, and aj are perpendicular to each other. So we know that 
aj =a— al, 


but we also know that (by the projection formula) we have 


_ (a: b)b _ (a- b)b 
b-b aaa er 


| 


because we can think of this as taking the product of a with the unit vector along the b-direction. (Then there is a 


factor of “length of b” twice in the numerator and also twice in the denominator, so they cancel out.) As a check to 


7 (2- 9) . 


make sure this correct, note that 


(by substitution), which simplifies (by distributivity) to 


(a-b)(b-b) _ 


b 
a b- Db 


0. 


So indeed a, is perpendicular to b, and now we're going to use the fact that a, dotted with itself is nonnegative (after 


all, the equality case of the Schwarz inequality is when a and b are parallel to each other). Thus 
(a-b)b (a-b)b 
: _ : > 
ap-ay (: Fab a 5b > 0, 


(a:b)? | (a:b)? 
b-b | (b-b) 
Combining like terms and multiplying through by (b- b), which is nonnegative, yields 


and expanding this out yields 


a-a—2 > 0. 


(a-a)(b- b) — (ab)? > 0 => |al*|b|? > |a- BI)’, 
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and taking positive square roots yields the result. 


(There will be a version of the Schwarz inequality for complex vector spaces, but we won't prove that one in class.) 
One interesting question is to ask where this inequality is saturated — this means that our inequality is actually an 
equality. And this only occurs is if a, is the zero vector, which means a and b are parallel at equality of the Schwarz 
inequality. 

Again, notice that we never used the formula for the inner product in this proof! So we've abstracted away the 


important properties, and that will become useful as we transition into complex vector spaces. 
Remark 95. Required content now resumes. 


So now let's take the inner product axioms that we've been discussing in the real vector space case and transfer it 
to our complex vector space. We'll stop using the dot product notation in favor of a new notation that shows that we 
start with two vectors and get out a number: our inner product will now look like (-, -), where the -s are vectors. As 
before, this inner product will be equal to a number, but now it will be a complex number. 

This time, the order of our vectors may matter, and for inspiration let’s try to imagine how we might define an 
inner product on C”. In such a vector space, vectors are of the form z = (Z,Z,--+ , Zn), where the zs are complex 
numbers, and we know that we define a complex number's length squared by multiplying it by its complex conjugate. 
Thus, we'll want to define 


2 
|z|° = 2621 + ZZ +--+ ZZ. 


This will be a real number, and it’s always nonnegative — in fact, it only vanishes if all of the components are zero. So 
this is a nice model for the length squared of a vector, and it suggests a possible nice definition for an inner product 


in general: perhaps we will want 


(W, Z) = WpZy + W5Z+ +++ + Wp * Zp |. 


Notice now that the vectors w and z play different roles — we complex conjugate the ws, but we don't do this for the 
zs. And we need to do this so that we can actually define a length for our vectors! 
But we want to get a set of axioms that tell us all of the interesting and necessary properties of the inner product, 


and that’s what we're going to do now: 


Definition 96 
An inner product on a complex vector space is a number-valued function (-,-) which satisfies the following 


axioms: 


1. (v,v) > 0 for all vectors v. (In particular, this quantity is always real.) 


2. (v,v) =0 if and only if v = 0. (This will be useful for proofs of many properties.) 
3. (U, V1 + V2) = (U,V) + (U, V2). 
4. (u,av) = a(u,v). 


ny yy 


Axioms (3) and (4) are particularly noteworthy here: they're not actually identical to the real vector space case! 


After all, if we switch the order of our vectors, we're conjugating different components. 


Proposition 97 


For any vector u, we have (u,0) = 0, and similarly (0, u) = 0. 
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Proof. Plug in v2 = 0 into axiom (2) of inner products to find that 


(u, v1) = (u, 4) + (u, 0), 


so indeed (u,0) = 0. Take the complex conjugate of both sides to find that (0, uv) = 0 by axiom (5). 


Axiom (3) shows linearity in the second argument only, and it turns out that we can use this to also figure out 
what happens with the first argument: suppose we want to compute (uy + Us, v). We can use axioms (5) and (3) to 
say that 

(U1 + Up, V) = ({v, ur + Ue))* = ((v, un) + (V1, U2) )* = (Vv, un)" + (Vv, Ue)". 


And now use axiom (5) again to reverse the inner products, and we've found linearity in the first entry as well: 


(uy + Uo, V) = (U1, V) = (Ue, Vv) |. 


But what's most interesting is axiom (4): if we apply the same logic, we find that 


(au, v) |= ((v, au))* = (av, u))* =} a*(u, v) |, 


and this is a point of potential mistakes: we have conjugate homogeneity. A complex number comes out of the left 
of the bracket with a conjugate, but it comes out of the right of the bracket unaffected! So again, this shows that 
the role of the first and second entries in our inner product is not identical. 


But now that we've defined an inner product, we can do more useful things with it: we can define the length 
Iv? = (v, v), 


and we can also start relating vectors to each other: 


Definition 98 


Two vectors u, v are orthogonal if (uv, v) = 0 (which also means (v, u) = 0). 


Notice that we've set our inner product to be non-degenerate by axiom (2): 


Lemma 99 


If (x, v) = 0 for all v, then x = 0. 


Proof. Set v = x: then (x, x) = 0 if and only if x =0. 


We also have two other nice properties: the Schwarz inequality still holds, so we have 
(u,v) < |ully. 


(We'll see the proof in our homework for the complex-valued case.) This is extremely useful — for example, it'll be used 
to prove the uncertainty principle. And the saturation point of this inequality is where v = cu for a complex number 
c: this is the equivalent statement to v and u being parallel. 
We also have the triangle inequality 
Jut+tv| < |u| + |v}. 


This is a geometric statement, and it's saturated when v and u point in the same direction: that is, we have v = cu 


for a real positive constant c. 
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So we've now gotten to a very important point: we can now define a finite dimensional Hilbert space. 


Definition 100 


A finite-dimensional Hilbert space is a finite-dimensional vector space equipped with an inner product satisfying 


the appropriate axioms. 


In the infinite-dimensional Hilbert space case, we actually need to add a more subtle condition: any infinite sequence 
of vectors with a limit must converge. (This is the mathematical property of being Cauchy.) Luckily, for this class, 
we won't need to worry about this assumption. 

With this, it’s time for us to go back to our basis and basis vectors. It will be convenient for us to pick nice bases 


in which computations are simple, and that’s what we'll quantify right now: 


Definition 101 


A set of vectors (€,,--- , @,) are orthonormal if 


(ej, 6) = 


A basis of vectors which are orthonormal is an orthonormal basis. 


Here, “ortho” comes from “orthogonal,” and “normal” comes from “normalized” — any two vectors are orthogonal, 
and each vector has length 1. 


The reason orthonormal bases are nice is that we can do certain computations easily: for example, 
= 2 
vV=aye, +--+ ane, => lvl> = (v,v) 


can be expanded out to 


= (ae, +--+ + anen, ae, +--+ + anen). 


By homogeneity on the right, we can take out the constants on the right, and similarly we can take out constants 


(with conjugate factors) from the left. So our terms will look like 


= aajlene), 
id 


and the only terms that survive this are where / = j, leading us to a final answer of | ajay + aa. +---+ ata, | only 


the diagonal terms survive. This is a Pythagorean-like theorem: the length squared is the sum of the squares of the 


components if we use an orthonormal basis. 


Proposition 102 


An orthonormal set of vectors Is linearly independent. 


Proof. Call the vectors in this set €,,--- , €,. Suppose that we know that 
aje1 +--+ + ann = O. 


If this vector is equal to 0, then it must have zero length (by axioms of inner products), and thus the length squared, 


aja, +--+ + a%a,_, must be zero. So all of the a; are zero, and indeed the vectors are linearly independent. 
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So now if (e1,--- , €n) is an orthonormal basis, we can write any vector as some linear combination v = ee aj}. 
And then notice that 


n 


(e,v) = } le, aie) 


i=1 


by linearity, and then everything disappears except the / terms: 


n 
= yi aidij = qj. 
i=1 


This means that we can write any vector v as 


Near the beginning of our discussion of linear algebra, we mentioned that a basis always exists for a finite-dimensional 
complex (or real) vector space. And it turns out that we can always get an orthonormal basis as well! This 


procedure is very practical — it’s known as the Gram-Schmidt procedure, and it goes as follows: 


- Assume we're given a list of linearly independent vectors (v1,---,V,) that span some subspace of V: we'll 


construct an orthonormal basis of that same subspace. 


« Pick our first vector by normalizing: we let 


VY 


ey = — |. 
* lv | 


(This denominator is not zero, because the zero vector can't be in a linearly independent set.) 


« Pick the second vector by starting with v2 and making it orthogonal to e,: because vy, and v2 are linearly 


independent, we can use € = v2 + ae; for some a. To figure out what a should be, note that we want 
(€1, €2) = (€1, V2) +(e, €1) =O => a= —(E€}, v2). 


So we subtract off a bit of ey — we won't end up with the zero vector because of linear independence — and then 


we Just need to normalize our vector by dividing by its length: 


— ve (€1, V2) 1 
|Vo — (€1, V2) €1| | 


€ 


+ We can do this inductively as well — we just subtract off a bit of each of e, through ej_; when we are creating 


our jth vector: 


Vy AB Oi ee Bay Vi Gea 


e= F 
OW Weis Wer = ee = ee a Ve | 


Indeed, we can check that (by construction) this vector is orthogonal to all of the first j — 1 basis vectors, and 


it also has length 1. 


Again, this is a useful procedure, and we'll get some practice with it soon! 
One other thing we can say about the inner product is that it helps us build subspaces. The concept of orthogonality 
is very powerful: for example, the set of vectors orthogonal to a given vector v is a subspace (since the sum of 


two vectors orthogonal to v, as well as a scalar multiple of a vector orthogonal to v, are still orthogonal to v). 
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Definition 103 


Let U be a subset of a vector space V. Then U+ is the set of vectors v € V such that (v, u) = 0 for all u € U. 


With this, a natural case to consider is that where U is a subspace. What's nice is that U and U+ then decompose 


our full vector space nicely: 


Theorem 104 


Let U be a subspace of a vector space V. Then we can write V as a direct sum 


V=UeUt-. 


Here, U+ is known as the orthogonal complement of U. This might be intuitive to us: for example, the xy-plane 


and the z axis together give a decomposition of three-dimensional space. 


Proof. Concretely, V = U@U+ means that every vector in V can be uniquely written as the sum of a vector in U and 
a vector in Ut. 


First, we'll find a way to write V in this way: let (e,,--- , @,) be a basis for U. We can write any vector v as 
v = [(e,, vyeg +--+ + (En, V) en] + [v — (ex, Ver — ++ — (en, V) en] 


(everything trivially cancels on the right hand side except v), but now we claim that we already have a way of 
representing v: the first bracket term is in U, and the second bracket term is in U+. The first term is clearly in 
U, and it's fairly easy to check that the second term is in U4 because it’s orthogonal to all of the basis vectors of U! 

Finally, we need to show that UM U* only contains the zero vector, so that our representation is unique. Suppose 
that v is in both U and U+: then (v, v) is the product of something in U and something in U+, so it must be 0. And 


thus v = 0 by our inner product axioms. 


So any Hilbert space can be written as a direct sum of a subspace and its orthogonal complement — this result will 
be very useful in the future. And this brings us to the final idea from this lecture: orthogonal projectors. We'll start 
with a motivating example: in three dimensions, a vector has an x, y, and z-component. Consider a linear operator 
which just preserves the x-component: this would be a projection into the x-direction, which is a one-dimensional 
subspace. So we should think of projectors as operators which “forget” some things about our vector. 

Let's think about what this projector looks like: it's a linear operator, so we can represent It as a matrix. If a vector 


has components v = (v1, v2, v3), we have the projector into the x-direction 
1 

P, = |0 

0 


(Indeed, we can check that Pxv = (v1, 0,0), as desired.) Similarly, just having a 1 in the middle will give us P,, and a 
1 in the bottom right entry will give us Pz. 

We do want to understand these projectors in more detail, though, beyond what its matrix looks like. This is because 
we have a measurement postulate in quantum mechanics, where our wave function collapses into an eigenstate when 
we measure it: that’s actually a projection operator. 


So now we're ready to state things more generally: 
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Definition 105 


Let U be a subspace of a vector space V. The orthogonal projector Py is defined as follows: we can uniquely 


write any vector v € V as u+w, where w € U and w € U-, and then we define Py(v) = u. 


As we've said already, we keep the part of the vector in U and we throw the rest away. This is a linear operator 


— we can check this by explicitly verifying the conditions — and note that we can also view this operator as saying that 
Pu(u + w) = Py(u) + Pu(w) = u +0. 


With this more formal definition, we can start thinking about some more properties: it’s called the orthogonal 


projector because it uses the orthogonal decomposition U @ Ut. 


Fact 106 


The null space of the linear operator Py is U+, because these are the vectors with no component in U. 


In particular, if U is not the whole vector space, U+ is not just the zero vector, so Py is not injective. 


Fact 107 
The range of the linear operator Py is U — we can’t get any components outside of U, and any vector u € U is 


sent to itself. 


Let's restate this last fact in a slightly different way: for any vector u € U, we have Py(u) = u, and for any vector 
w € U+, we have Py(w) = 0. So we can write a formula for the action of the projection. Letting (e1,--- , en) bea 
basis for U, we have that 
Pu(v) = (er, v)ep +--+ + (Cn, V) En. 


(Notice that we've dropped the basis vectors for U+ here, because they are killed by Py.) Note also that 


Pu(Pu(v)) = Pu(u) =u PuPu = Pu 
because we don't do anything else to our vector after we project once — we're already in U after one projection. 


Fact 108 
However, there are linear operators T with 7? = T that are not orthogonal projectors — this condition is necessary 


but not sufficient. 


Proposition 109 


For any orthogonal projection Py and vector v, 


[Pu(v)| < IVI. 


This should be intuitively obvious — we're losing perpendicular components, so the total length is smaller. 


Proof. More rigorously, note that 
(v,v) =(u+tw,ut+w), 
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and now we can expand out by linearity. Since u and w are orthogonal by definition, the cross terms disappear, and 


we have 

|v|? = ul? + |w/?. 
This is the Pythagorean theorem, and now |u| = |Py(v)|, so indeed |v| is at least as large as |u| = |Py(v)|, as 
desired. 


One final way we can describe this projector is in terms of eigenvalues here, and this story is particularly simple — 
we should keep it in mind. Remember that eigenvectors correspond to a specific kind of invariant subspace, and the 


most obvious invariant subspace of Py is U itself. 


Proposition 110 


For an orthogonal projector Py, any vector in U has eigenvalue 1, and any vector in U+ has eigenvalue 0. 


In particular, if we want to find a basis of eigenvectors, we can just pick the orthonormal basis vectors of U and Ut. 
And notice that we could have predicted this from the start: we know that our operator satisfies the equation P? = Py, 
so the eigenvalues must also satisfy that equation: ~wW=rA — rE {0,1}. And in this case, the number of ones 
depends on the dimension of U, and to understand that better, we can talk again about the matrix representation. 


If we want the matrix where we have V = U@ U,, and we have an orthonormal basis of V of the form 
(Big 82, Gas igre es 


where the es are a basis for U and the fs are a basis for U+, we can represent this with the (n+ k) x (n+ k) matrix 


1 1 0 0 
1 ad. oH Ve aks OH 
Pu= 
G@ «i O@ 0 as 
SG ae OO ams 8 


where the top left corner forms an n x n identity matrix and we have zeros everywhere else. (There are k rows and k 
columns of all zeros.) This is the representation when our first n vectors are in U — we can check that this gives us 
back the components in U. 

Of course, this matrix will look different in different bases, but there are a few invariant properties. The trace of 
this matrix will be n, which is the dimension of the space U — in such a case, what we have Is called a rank n projector. 
And its determinant is 0 (unless the projector is the identity projector), because our projection operator is not injective 
and therefore not invertible. 

We'll only be discussing orthogonal projectors in this class, and the key thing to remember is that these come out 
of a decomposition of a vector space as U@ Ut. And that gives us all of the nice properties that we've been talking 


about! 


10 February 18, 2020 


Today, we're continuing with matters of linear algebra — there are about three more lectures on this topic. As a 


reminder, our second problem set is due on Friday. 
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We'll start by completing an exercise from last time: as a review, let's look some more at the spin-1/2 model. In 


about a week, we'll talk about bra and ket vectors formally, but we'll start doing some manipulation of them right now. 


If we represent our basis states (spin up and down) via |+) = 


1 0 
; and |—) = "| we can write a general state as 


Cy 
Ib) =a|+)+@|-) = 
C2 
Bra vectors, on the other hand, are the conjugate transpose: we have (+| = [2 0| and (—| = [o i], which tells us 


that 
Wl=d(H+aCl=[q al. 


One special state is the one where the spin points along some normal vector 7: we've shown in lectures that 


0 0. 
|+) +sin —e®|-). 


|7) = cos 5 5 


(This is confirmed via the fact that |) is an eigenvector for our operator Sz = #-S = aa. Gg.) 
So let’s say we have some object |+) (+|: what exactly is this? We should think of this as an operator: this object 


acting on |+) gives us 


I) Cel ep = Tp 


and acting on |—) gives 


ee] yee a 


because {|+) ,|—)} form an orthonormal basis. (It takes some getting used to, because |+) and |—) seem to point in 
opposite directions!) So this tells us the matrix representation: we know how it acts on |+) and |—), and actually one 


shortcut we can take is to just take the matrix representations of |+) and (+| and multiply them together: 


Je d-[ g 


A justification for this is that the individual parts |+) and (+]| are operators of some sort, and writing them next to 


r= 


each other Is essentially defined by multiplication. 


Example 111 


What is the matrix representation of |) (7|? (This should be a 2 x 2 matrix.) 


Much like in the example above, note that 


la) (AIA) = |), 


while 
|7) (a |—7) = 0, 


so this is a projection operator onto the vector in the 7 direction. But we want to use Pauli matrices to represent this 


more explicitly. 
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Fact 112 
Note that it’s good to know the double-angle identities 


sin(2x) = 2sin x cos x, cos(2x) = cos? x — sin? x. 


One way to derive these is to use the formula e7/* = e! - e* and equate real and imaginary parts. 


Here's the right way to do this problem: note that 
cos 5 =~ @ cn. 0,-id 
|) = |. | => (A= [cos § sin 5e€ 


So now we do the outer product 


cos g F es cos? 2 cos £ sin Ze 
- 6 -ib] — 2 2 SINS 
9 | [coss singe =|, a See ; 
sin $e’? sin § cos $e’? sin? § 


We want to get to the 7 vector, which contains sin@ and cos @ (spherical coordinates), so we need to get rid of these 


Bs. Using the double angle formulas, we have 


a <e. a é 1+cosé 
cos @ = cos? — — sin? — = 2cos*—= —1 = cos? = = ——_—_.. 
2 2 2 2 2 
A similar calculation gives us sin? g, and we can also use that cos g sin? = ane 


> € 


1+cos@ = sin@ ,—i¢ 
= 2 
|") (7 = sin@ gid 1—cos @ 
2 2 


We can now break this up some more into the components of 77, which are ny = sin@cos¢,ny = sin@sing, and 
Nz = cosé@: 


sinde /? = Ny —Iny, sin@e o Ny + iny, 


sO Our operator is actually equal to 


1} 1i-4+n ny — In 1 
+ is x Al eat l+ng 
2]netiny 1-1, 2 


which gives us what we want: 


Imm = 5 (4-8), 


where @ = (0x, Oy, @z) contains the three Pauli matrices. Indeed, we can now check that if 7 points in the z-direction, 


Pb 
we recover our matrix 


We'll now move on to another topic: Taylor series for operators. Is this just a notation for physicists, or is this 


actually an object in mathematics? We have the famous equation 


@ 


e” = cos@ + isin8, 


which we can derive in many ways, but one Is to use a series expansion and equate real and imaginary parts of the 
Taylor series. 


So if we have a matrix M, how can we evaluate the function e’”*? It probably doesn’t make sense to exponentiate 
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each entry — this wouldn't have nice properties, and it would behave badly. So the best hope we can have is to mimic 
the power series e*: this is a definition, and that would just tell us that 
M762, M363) 


iM@ 
=!+1Mé@ 
e I 5 5 


Every term here is a matrix, which is why we turn the 1 at the beginning into an identity matrix. And now this sort of 
makes sense: if M is actually a diagonal matrix, then e™ is just the matrix with the diagonal entries exponentiating. 
But one nice thing to notice here: suppose that our matrix M satisfies M? = /. (For example, this is true for all 
of our Pauli matrices.) Then e/”® will take on a particularly nice form: we derived e'? = cos@ + isin@ solely from the 
fact that i? = —1, and since M also splits up nicely between the odd and even terms, we'll see (on our homework) 
that this gives something pretty clean as well, because (iM)? = —!/ is the negative identity matrix. 
If we're in two dimensions, things are even nicer: specifically, it’s nice to look at e! when M is a Hermitian matrix. 


Consider an object like 


We can rewrite this by factoring out the length of 2: 
= l(4a)a_ 


and it turns out (4-@)* = / (this takes a few lines, but it’s not too bad to show), and then we can use the same 


“Euler's identity” expansion. This yields 
e'7% = cos|a] + i(4- &) sin |a}. 


Our final topic for today will be that of inner products, which are an object that we often have to invent. We 
usually need to make a definition that satisfies a set of axioms, and here’s an example where that definition may not 


be so obvious. 


Example 113 


Let V = M,(C) be the vector space of N x N complex-valued matrices: how can we define an inner product? 


We know that if we multiply any matrix A (which is a vector in our vector space V) by a complex number a, then 
we just multiply all of the entries by a. So we need to define the quantity (A, B) in a way that satisfies all of the 
axioms: it must be linear on the right entry and antilinear on the left entry, it must be positive definite, and so on. 

The most difficult part of this is that (A, A) has to be some nonnegative real number, so something silly like 
(A, B) = Ai3Bs7 isn’t going to work. Remember that we have two “machines” that give us numbers out of matrices: 


the determinant and the trace. Maybe we want something like 
(A, B) = det(AB), 


but this is bad — remember that the determinant grows by a” if we multiply A by a. Luckily, the trace is much better 
— it scales by the right amount, and if we didn’t know anything, we might want to say that we take the squared 
magnitude of all entries and add them all up. What's great is that if the norm is 0, the matrix has to be 0, so we must 


be almost correct. So here's the real answer we're going for: 
(A, B) = tr(A'B). 


We can verify that this does give us exactly what we want! 
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11 Linear Algebra — Vector Spaces and Operators, Part 4 


Today, we'll develop the concept of an adjoint or Hermitian conjugate operator. This is a bit subtle, but we'll work 


towards its slowly! To get started, we'll need a bit of background first. 


Definition 114 


A linear functional on a vector space V is a linear map @ from V to F. 


Since this map is linear, we know that 
b(M1) + P(v2) = (Mu + v2), (av) = ad(v), 
where both of these are equalities of numbers in our field. 


Example 115 


A linear functional on R® can take (X1, X2, X3) to the number 3x, — x2 + 7x3. 


We can also write this in vector notation if we have a (real) inner product: 


(v) = O(M1, v2, v3) = (3, -1, 7) - (Vi, v2, v3) = (u,v), 


where u is the vector (3, 1,—7). So what we've done with the inner product is extract some vector u out of the linear 


functional which “defines” @, and this turns out to be true in general! 


Theorem 116 


Any linear functional @ on a vector space V can be uniquely represented as o(v) = (u, v) for some vector u € V 


(and we denote the functional ¢,). 


Proof. We'll assume V is finite-dimensional. Then let (€,,--- ,@,) be an orthonormal basis of V, and we can write 
any vector v as 


v = (e,, ve, +--+ + (en, V) En. 
Then applying v and using linearity, 
P(v) = (er, v) b(e1) +--+ + (en, v)P(En) 


(because the inner products here are just numbers and can come out of the @). But now we can bring $(e1), which 


is a number, into the inner product: 


b(v) + (erP(er)", v) +--+ + (enb(en)”s v) =| (erb(er)* + ++ + enblen)", v) | 


where we are plugging the constants into the left term so we need to add conjguates. But now this last term is just 
an inner product, and we can take u = e,@(e1)* +--- + enh(en)*. 
Uniqueness is pretty easy: suppose we could write é(v) = (u,v) = (u’, v). Then subtracting the two expressions 


and using linearity, we must have (u — u’,v) = 0 for all v, which means u — u’ = O (for instance by plugging in 


v=u-U'). 
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Like with many other proofs, the central idea here is just that we know how the linear map acts on each of the 
basis vectors, and then we can determine everything directly. 
With that, we can now define the adjoint (physicists use “Hermitian conjugate”, which is more pictorial). This will 


be related to the concept of a Hermitian matrix, if we’ve seen that in linear algebra before. 


Definition 117 
The adjoint or Hermitian conjugate of an operator T € L(V) is denoted TT, and it is a map satisfying 


(u, Tv) = (= (Tu, v). 


To show that this is actually well-defined, note that (u, Tv) is a linear functional (we can check the linearity axioms 
because T is a linear map), so there is some vector u’ such that it is equal to (u’, v), and here we're defining T'u = u'. 


Now we know that Tt is some map from V to V, but we don’t really know that it’s linear yet! 


Proposition 118 


Tt € L(V) for any linear operator T. 


Proof. Notice that 


(uy + uo, Tv) =| (TT (ur + up), v) 


by definition, but we also have 


(uy + Ue, Tv) = (un, Tv) + (ua, Tv) = (Thun, v) + (Thu, v) =| (Thun + Tu, v) |. 


Comparing the two boxed statements shows that we do indeed have Ti(uy + Up) + Tru, + Ttup. Similarly, 


(au, Tv) = (| T'(au), v) |, 


but we also have 


(au, Tv) = a*(u, Tv) = a* (Tu, v) =| (aT*y, v) |, 


and thus T'(av) = aTtu and we've verified both linearity conditions. 


So T? is doing all of the right things, but we still don't really know what it’s doing. So we'll show some more 


properties and do some more examples. 


Proposition 119 


For any two linear operators S, 7, we have (ST)! = TTtST. 


Proof. This is some more symbol pushing: 


((ST)tu, v) = (u, STv) = (Stu, Tv) = (TtStu, v). 


Proposition 120 


For any linear operator S, (St)i = S. 
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Proof. Notice that 


(u, Stv) =| (Sttu, v) | 


but we can also flip the order of our arguments: 


(u,.S'v) = (Stv,u)? = Wy, Su" =| (Sa, v¥, 


and equating the boxed statements gives us what we want. 


Example 121 


Suppose our vector space is V = C?, where we represent our vectors as (V1, V2, v3). Suppose we have the linear 


map 


T (V4, Vo, V3) = (OV, + 22 + 1¥3, Vy — [V2 + OV3, 3iv¥y + Vo + 7v3). 


(where we've written out the linear operator in components). 


Our goal will be to find Tt and to write the matrix representations for both T and TT in the standard basis (that 


1 0 0 
is, with the three basis vectors }O] , |1], and |0O]). 
0 0 1 


We'll find T' by using the basic property (u, Tv) = (Ttu,v). We'll first compute the left side: letting u = 


(uy, U2, U3), and implicitly using the standard inner product (u, v) = uf vy + U5 v2 + uZv3, we have 


(u, TV) = uz (2vo + iv3) + U5(V4 — fv2) + u3(3/v1. + vo + 7v3). 


Since we want to set this equal to the inner product of (something) with v, we can rewrite this so that we separate 


the v-components. Collecting terms, we see that 


(u, Tv) = (us + 3/u3) v1 + (Quy — ius + u3)v2 + (iut + 7u3)Vv3. 


Since this is the inner product of the vector Ttu with v, we must have that the components of Ttu are 


Thu = (up — 3iu3, 2u, + iu + uz, —iu, + Tus), 


remembering that we need to complex conjugate each entry, so is become —/s and we lose all of the conjugates on 
our us! 
It's pretty important for us to understand how to get the matrices out of this — we'll do a bit of the work here. 
First of all, let's do T: we have 
Te, = 7(1,0,0) = (0,1, 3/) = & + 3/es, 


and because Te; is supposed to be >, Tyjex, this means that Tai = 0, To1 = 1, Tai = 3/. (In other words, Te, gives 


us the first column.) Repeating this argument, we see that 


0 2 | 
T=]1 -/i 0 
3; 1 7 
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And finding Tt is exactly the same process — we find that 


0 1 —-3) 
Te|2 ¢ A 
—! 0 7 


And now notice that these two matrices are Hermitian conjugates of each other: we get one from the other by 
taking the transpose and complex conjugate! And this is not an accident — let's try to get that more generally. 


We know by definition that we always have 
(Tlu,v) = (u, Tv). 
Suppose that u = e; and v = e; are two elements of our orthonormal basis. This tells us that 
(Te, 6) = (e;, Te), 


and now we can use the matrix action: since Tte; = >, Thi ex (this is an equation worth knowing by heart), the left 


(x 7} Gk. “| = (a Tues] P 
k k 


Now we use orthonormality: the matrix terms are just numbers, so 


and right hand side will become 


aa Oki = Thjdik 


where the complex conjugate comes from us taking the number out from the left entry. And now the left side is en 


and the right side is 7;;, and flipping indices and taking complex conjugates tells us that 
Tl = (T;i)* 
i] Ji} > 
And now we've proved it: the (i, /)th entry of the matrix Tt comes from the transposed entry in T after taking a 


complex conjugate! 


Fact 122 
Notice that this only worked because we have an orthonormal basis — in other matrix representations, the Hermitian 
conjugate will not always be orthonormal! Instead of having that 6;; term above on both sides, we'll now get some 


ugly number (e;, e;) = gij. And then we have 


Caer, = TiGik: 


where we're summing over k on both sides, and now we don’t have something quite as nice anymore. But if the 
matrix of gjjs is at least invertible, we can take the inverse matrix on both sides, and then we get a formula for 


Tt in terms of g and its inverse and matrix multiplication. 


The key point here is that the Hermitian conjugate has a basis-independent definition: It's not the conjugate 
transpose in all bases, so it’s better to use the definition with the inner product above! 


We're now ready for a nice result which is only true in complex vector spaces: 
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Theorem 123 


Let V be a complex inner product space. Then if (v, Tv) = 0 for all vectors v, then T = 0. 


This is not true in real vector spaces: for example, take V = R? and let T be the operator that rotates us by 90 
degrees. Then indeed v and Tv are always orthogonal, so (v, Tv) = 0. So this is another reason why complex vector 


spaces are nice, and we'll be using this result soon! 


Proof. \t seems at first sight that this will be a difficult proof — we need something that distinguishes real and complex 


vector spaces. Our strategy will be to prove that | (u, Tv) = 0| for all vectors u,v in V, which is stronger because we 


have different vectors on the left and right. Then we can set u = Tv for each v, and that means Tv must always be 
zero by the inner product axioms! 
To prove such a thing (which does require a leap of faith), the idea is to rewrite (u, Tv) as a combination of 


(w, w)s. First, we can try 


(u+tv,T(u+v)) —(u—v,T(u—v))], 


and remember that by theorem assumption, both of these must always be zero. Evaluating by expanding cancels the 
(u, Tu) and (v, Tv) terms, but we get cross terms of twice each of (u, Tv) and (v, Tu). So now we introduce the 


complex numbers: we try adding in 


+(u+iv,T(u+iv)) — (u— iv, T(u— iv)) |. 


Again, the terms (u, T(u)) and (iv, T(iv)) cancel out, but the cross terms this time are twice each of i(u, Tv) and 
—i(v, Tu) (the negative sign because of conjugate homogeneity). But now we have a relative negative sign, and now 


we can put everything together: it turns out that 


(u, Tv) = : (w+ v, T(utv)) —(u-—v,T(u—v)) + “(ut iv, T(u+iv)) 


1 ; 
ri (ui. Tu i9))) : 


i 


Indeed, this gives us four terms of (u, Tv) and zero terms of (v, Tu) inside the parentheses! But by theorem assumption, 


the whole right side is always zero, so we've indeed shown (u, Tv) = 0 and thus T = 0, as desired. 


Let’s come up with an application for this: 


Proposition 124 


If (v, Tv) is real for all v, then Tt = T (the operator is Hermitian or self-adjoint). 


Proof. Since this quantity (v, Tv) is real, we know it’s also equal to (v, Tv)*, and thus this is equal to (Tv, v). But 
by the definition of the adjoint, (v, Tv) = (T‘v, v), and thus 


(Tv,v) =(Ttv,v) = (Tt-T)v,v) =0 


for all vectors v. Alternatively, this means (v, (T'—T)v) = 0, and now using the above theorem, it means T'—T = 0, 
so Tt=T. 


And this theorem actually goes both ways: the reverse direction is pretty easy to show. What’s important here is 
that having a Hermitian operator is the same as saying that (v, Tv), the expectation value of 7, is always real. And 


that is important because we want to eventually get back to physics! 
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We'll delay a discussion of diagonalization until next lecture — we'll first prove some basic properties of these 


Hermitian operators. 


Proposition 125 


Eigenvalues of a Hermitian operator are real. 


Proof. Start with | (v, Tv) |. Suppose that v is an eigenvector of T with eigenvalue A: then Tv = Av, so 


(v, Tv) = (v,Av) =] Av, v) | 


But we can also move the T to the other side: 
(v, Tv) = (Thy, v) = (Tv, v) 


(because T is Hermitian), and then this simplifies to 


= (Av, v) =| A*(v, v) |. 


Equating these two expressions, since v is an eigenvector, we can assume it is nonzero, so (v,v) # 0. Dividing that 


out yields X = A*, and thus A must be real, as desired. 


Proposition 126 


Eigenvectors of a Hermitian operator with different eigenvalues are orthogonal. 


Proof. Suppose we have two eigenvectors v1, v2 with eigenvalues 1, A2 respectively. We're assuming A, # Ao; the 
idea is that we might get degeneracies where higher-dimensional subspaces all have the same eigenvalue. (In that case, 
every eigenvector in the subspace with some fixed eigenvalue is orthogonal to every eigenvector in another subspace 
with a different fixed eigenvalue! ) 


We'll consider the inner product (v;, Tv2). We can evaluate this in two ways: first of all, 


(v1, T V2) = (v1, A2V2) =] Ao(V4, V2) |, 


but we also have that 


(v1, Tv2) = (Tv, V2) = (Avi, V2) =} A1(V1, V2) 


because our eigenvalues are already real. Since these expressions are equal and A, and Apo are different, this means 


(V1, V2) = 0 as desired. 


Aside from the class of Hermitian operators, there’s also another class that are as important: unitary operators. 


Mathematicians say that such operators are an isometry: they preserve length, which means that 
|Su| = |u| 


for any vector u € V. 
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Example 127 


Consider the operator T = A/, which multiplies any vector by ». If |A| = 1, lengths are preserved, and we have 


an isometry (which is just a rotation in the complex plane). This is because 


[Tul = |Aful = |Alul = Jul. 


Notice that any unitary operator S only sends the zero vector to zero, because lengths are preserved! And this 
means that the null space of S is trivial, and therefore S is actually invertible. 


But to make more progress, let’s work with the equations a little bit more: 
|Su| = |u| = > (Su, Su) = (u,v). 
We can move the S from one side to another and pick up a dagger: 
(u, S'Su) = (u,u) => (u,(STS— Iu) = 0. 


Since this is true for all u, we can use our favorite theorem above to find that S'S = /. As a nice property, notice 


that this tells us that 


(u,v) |= (u, StSv) =| (Su, Sv) |. 


And now we have a way to define unitary operators formally: 


Definition 128 


An operator U is unitary if U~1 = Uf". 


(We'll assume that being an inverse from the left is the same as being an inverse from the right.) Because of the 
boxed equation above, this unitary operator preserves the inner product, not just the norms of vectors! 
It turns out that unitary operators have particular significance for bases: suppose we have an orthonormal basis 
(€1,--- ,@,). Then defining the basis 
f, = Ue,  U unitary, 


notice that 
(fi, fi) = (Ue;, Ue) = (4, &)) = Oy, 
so our new basis is actually orthonormal as well! And playing a bit more with some indices, we can find that the entries 


of the unitary matrix 
Uki = (ex, Ue;) 


can also be written in the f basis: 
UD = (fi, Uf) = (ex, Uf) = (ex, fi) = (ee, Ver), 


which is exactly the same expression! And this means that a unitary operator looks the same in both orthonormal 


bases. 
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12 February 19, 2020 


We'll start by wrapping up our discussion from last time — at the end, we were trying to define an inner product on 
the space of matrices My(C) 
(A, B) = tr(A'B). 


To see what this actually is, we can write it out in index notation: we're summing over all diagonal entries 
=D A'8)y, 
J 
and then the matrix multiplication is another sum 


= Ds DA) iBi- 


Fact 129 


It's good to know how to write out matrix multiplication in index notation: in general, 


(A1A2) ij = So (Ar) ik (Aa)ai- 


k 


(This is because we're dotting a row with a column, so the first index should be fixed for the first matrix and the 


second index should be fixed for the second matrix.) 


As will be discussed (or was discussed in the lecture videos), we have (AT) j; = Aj; if we're using an orthonormal 


basis (which we'll assume exists — we'll prove this later). And now we have the definition of this inner product in plain 


=) A Bia 
a] 


Basically, we just take the corresponding elements in the two matrices and multiply them component-wise (with a 


English: 


conjugate)! This means that 


(A, A) = So |Ail? 


ij 
is only zero if all the entries are zero, which is what we want. 
It may be convenient to add an extra factor in front when we're using M>(C): if we define (A, B) = $tr(A'B), 
then 
a cee oe 
(01,01) =|oil- = 5tt(9101) = 5tt(o1) a 5tr(!) =L, 


(We've used here that the Pauli matrices are Hermitian.) And now if we have a general linear combination A = 
4101 + a005 + a303, we have that 
JAP =(A,A) = $2 afaj(oi,0%). 
1<ij<3 
(We don’t need to take the Hermitian conjugate of the ojs, because they are already Hermitian.) Under this basis, we 
can check that the ojs form an orthonormal basis — for example, because ojo; = dj! + i€ijxo,% — and thus we only get 


nonzero contribution from / = /, and we have 


|Al? = Jarl? + Jae? + las]? => JA] = V/lai|? + la2l? + lasl?. 
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Remark 130. /t's okay to put a number like s in front in the definition of our inner product, because there isn't any 
physical interpretation of this “length” that we've defined yet. And it's just useful to give us this particular form of the 


answer — it's not directly observable! We're basically picking our units here. 


Let's talk a bit more about matrix representations — there’s a few different ways they can come up. Suppose that 


we're working in three dimensions: then we can write any vector in terms of the basis vectors 
V = Ve, + Vo@ + v3e3 = (V1, Vo, V3). 
Sometimes, we're told what form a matrix takes based on what it does to the basis vectors: for example, 


Te, = ae, + b+ C63 
Te@& = aoe, + bo > + C2€3 


T e3 = a3@; + b3@ + C363. 


Then how does this matrix look? Well, the image of e; is the first column of our matrix (we can check this by 


1 
multiplying our matrix with | QO] ), so 
0 
a a a3 
T= |b, bo bg 
C1 0 


But there’s another way that we often might see 7 represented: perhaps we're told that a vector v has three 


components (v;, v2, v3), and then the new three components are 
T (V4, Vo, V3) = (a1V1 + V2 + 23V3, by V1 + Bevo + b3V3, C1V1 + CoV2 + €3V3). 


Then the matrix looks identical, but notice that our coefficients have been relabeled. This means that acting on the 
components gives us the rows instead of the columns (because we need aj, a2, a3 to be dotted with v4, v2, v3). So we 
should try not to confuse these two different perspectives! 


One last note about unitary operators: the definition of a unitary operator S is that for all vectors v, 
|Sv| =|v| => (Sv, Sv) = (v,v). 


Basically, if we have two S’s, we can delete them. And alternatively, we can move one of the S’s over to the other 


side, which shows us that 
(v, StSv) =(v,v) => (v,(STS—1)v) =0. 


One important result for complex vector spaces: 


Proposition 131 


If a linear operator T satisfies (v, Tv) = 0 for all vectors v in a complex vector space, then T = 0. 


The result here is particularly interesting because it’s not true in real vector spaces! (For example, consider R? and 
the operator T which rotates everything by 90 degrees. ) 


This means that for any unitary operator, 


Sis-f=0 => S15 Si, 
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We're working with finite-dimensional vector spaces here, so St being a left inverse also means that it is also a 
right inverse. And this is a nice property to have! 


Now, we'll turn our attention to some of the exercise problems. 


Example 132 


Suppose é,, €> form an basis that is not orthonormal: for example, say that 
(€1, 1) = 1, (@, &) = b, (er, &) = 0. 
For example, say that we have an operator T which satisfies 


Tep=etiea, Te =e, —&. 


Then what can we say about the matrix representation of TT, the adjoint operator? 


In an orthonormal basis, the adjoint operator Tt has a matrix representation which is just the Hermitian conjugate 
of the original. But that’s not quite true here, and we need to work through things again. 


Remember that we define the adjoint operator in a basis-independent way: we say that 
(Tu, v) = (u, TTv). 
Let's represent our operators in matrix form though: for example, 
1 1 
i ‘] . 


We can compute things by taking specific values of u and v: for example, 


T= 


(To, €2) = (€1 — &2, &) = —b, 


but this is also equal to (@2, T'e). So that tells us something about one of the entries of Tt: we can work things 
through, and we'll see that some extra b terms come through. 


Finally, let’s talk a bit about projectors. Suppose we have a vector space V, and we have a subspace U: define 
Vat uuys0 VYae U}. 


Basically, any vector in U is perpendicular to any vector in Ut. 


Theorem 133 


We can write V as the direct sum 


To show this, we just need to write down how to break it up into its components in U and U+. Assume that 


(€1,--- , @p) is a basis in U: first we can rewrite 

v= (e1,v) +--+ 4+ (en, Ven $V — (€1, V) — +++ — (pn, V) En. 
Now the first sum up to (@,, v)@, belongs to U, but the remaining part v — (e;, v) —--- — (@n, V) ep is orthogonal to 
all basis vectors €,,--- , @,, So it is in Ut! And the rest of the theorem follows by showing that this decomposition is 


unique because U and U4 share only the zero vector. 
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This allows us to define the projector onto u 
P,(v) = (1, Vv) + +++ + (€n, V) En. 
This says a few things that are actually important: 
Py (er) = (€1, €1)€1 = €1, 


and this is true for all basis vectors. So P,, acts as the identity on U (which makes sense, because we're just “projecting 
down” to it). 
On the other hand, say we have a w € U+. Then P,(w) = 0 is the zero vector, because (e;, w) = 0 for all i. And 


with this, we can figure out all of the behavior of P,! For example, 
Py Pu(v) = PAW): 


because the vector P,(v) is in U and thus it is fixed by the second P,. And thus the operator P,P, = Py, which is an 


interesting property to have. It’s not true that any operator with this property is an orthogonal projector, but we can 


still say something in general: we know that the eigenvalues must satisfy \7 = » = 1,0-— this means there are 
a lot of repeated eigenvalues! And it’s pretty clear what's happening to the eigenvectors in an orthogonal projector, 


too: the vectors in U have A = 1, and the vectors in Ut have \ = 0. The matrix representation in our specific basis 


fa 


where / is an identity matrix of size equal to the dimension of U. And that number is also equal to the trace of P,, 


is then (in block form) 


as well as its rank. 


13. Dirac’s Bra and Ket Notation 


Dirac bra-kets are a notation that is pretty nice for quantum mechanics — it’s very convenient for some physics 
problems, but it’s just another way of writing mathematics. We'll need to take two steps: going from inner 
products to bra-kets, and going from bra-kets to bras and kets. 

The first of these steps is just a change of notation: instead of denoting an inner product as (u,v), we'll denote 
it (u|v). (Basically, we put a vertical bar instead of a comma.) This is called a bra-ket, and recall that these two 
objects u and v inside the bra-ket are inside our vector space. 


So things aren’t too complicated here, but we can still try doing some practice: 


Example 134 


By linearity, we know that (u, C1Vv1 + CoV2) = C1 (U, V1) + Co(U, V2). This becomes 
(ulcrVa + CoV2) = Cy (ulvz) + C2 (ulv2) . 


Conjugate homogeneity gives us different constants on the left: 


(C1U1 + CoUa|v) = cp (ui|v) + CG (ue|v) . 
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Example 135 


We can write that the norm of a vector is |v|* = (v|v), and now the Schwarz inequality reads | (u|v) | < |ul|v]. 


Example 136 


In an orthonormal basis (in other words, a basis wehre (e;|e;) = 6;;), we Can write an arbitrary vector v as 


v= Soe (ej|v) . 


i 


Again, there isn’t really much that’s new here — we're just getting used to a slightly different notation. So let's 
move on to the second part: going from bra-kets to kets and bras. The idea this time is that we want to go from an 
inner product to two separate quantities: we'll spread the object (u|v) apart to get two objects (u| and |v), called a 
bra and ket respectively. 

We'll start with the kets: these turn out to just be regular vectors. If we have a vector v in our vector space V, 
then we'll just say that |v) is also in that same vector space — the meaning isn’t changing here. So the ket symbol is 
kind of like putting an arrow above the letter v: it tells us that we have a vector. And importantly, sometimes we 
use different labels which represent properties of the vector instead of the vector itself. 

To explain this a bit more, suppose we have a linear operator T acting on a vector v = |v). Then we can say that 
T |v) = |Tv): everything is consistent here because all of our labels are regular vectors. But in contrast, consider our 


spin states, which we often label |+) , |—) , |7, +), and so on — these labels + and — have nothing to do with the vector 


space themselves, so we cannot say that S, |+) = |S,+): S, acts on the ket vector represented by the + state, but 
it can’t act on + itself. 

Other than that, though, ket vectors are familiar — they are the usual objects in our vector space. So it just remains 
to understand what bras are — recall that we introduced the concept of a linear functional ¢, which acts on vectors 
and gives us numbers. We proved that for every ¢, we could find a unique vector u in our vector space such that 
o(v) = (u,v). And we even called these functionals @, — remember that these form their own vector space, because 
they can be added or scaled by constants. So these form some vector space V*, called the dual vector space of V. 
Notice that this space V* is parameterized by vectors u € V, so it actually has the same dimension as V. 

But notice that a bra acts in the same way as one of these linear functionals: a bra (u| is labeled by a vector, just 


like a linear functional, and it can act on a vector v as well: 
(ul v = (u|v). 


So let's just make that our definition: the bra vector (u| is the linear functional ¢,,. 


We can now switch over to a matrix formulation and try to understand bras and kets as row and column vectors, 


ay by 
respectively. Remember that the inner product between two vectors a= | | and b=] | iS 
1 an > pe 
(a|b) = afby +--+ + a% bn. 
But notice that this also works if we think of (a| as the row vector (a},--- , a): then indeed we have matrix multipli- 
cation 
by 
ft ae), 
Dn 
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which gives us a single number equal to the inner product (a|b). So it’s easy to construct a bra row vector: take the 
column vector, map it into a row, and take the complex conjugate of all entries. 
Note, by the way, that our bra vectors have pretty nice properties (because they are secretly still linear functionals): 


for example, 
(ulv) = (u'|v) Wve V 

implies that u = u’, so (u| = (u’|. (To prove this, note that the equation can be rewritten as 

/ 

| 


(u—u'lv) =0 (u-—u'lju-—u) =0 = u-u'=0. 


So now it makes sense to add bra vectors: since 


(v1 + valu) = (valu) + (valu) = (val + (vel) lu), 
for any vector u, we know that (vz + va] = (v1| + (v2|. Similarly, for any a € C, we know that 
(av|u) = a" (vl u = (a* (v|)u, 


which means that the bra (av| is also just a* (v|. What we've just derived are properties of linear functionals, and one 
related idea is that we often want to turn such a bra into a ket. For instance, if we have v = a, v1 + aoVo, we can say 
that 


|v) = Jayvy + apvo) a1 |v) + a2 |v2), 


but the dual space bra is slightly more complicated: 


(v| = (a1v + a2V2| = aj (V| + a5 (V2|- 


So passing from a ket to a bra can be done by complex conjugating every coefficient and turning every ket vector into 
a bra vector! 

And now we can put everything back together: recall from earlier that in an orthonormal basis, we can write any 
vector as v = )), e;(ej|v). We'll change our notation a little bit: the ket vector |e) will just be written as |/) (so 


we're using the label instead of the vector itself, and this way we can write less). And now we find (in bra-ket notation) 


that 
Iv) = dla ily), 


and we'll soon see the significance of writing our expressions in this way. 

Up until now, everything has just been changing notation, but once we introduce operators in bra-ket notation, 
we'll see the properties work out more practically. Recall that in an orthonormal basis, an easy way to get the matrix 
coefficients is to find 

Tape} Fe). 


The indices match easily (so this formula is easy to remember) — if we want to remember how to prove it, note that 


Te = Tex summed over k, and then we use orthonormality. But now we can rewrite this in bra-ket notation: 
Ti = (eile) = (eil T |e) 


(remembering that this is the meaning of |T e;)), and now we can just label our bra and ket basis vectors with numbers: 
we have 
Ty = (iT). 
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And what's nice about this is that we can now write the entire operator as a sum: 


Proposition 137 


For any operator 7, we can write 


T= dol) THU. 


Here, the Tj; is just a number — we put it in the middle to be convenient, but tt doesn’t really act on the ket or 
bra vector here. An important point — an object of the form |v) (v|, where the ket and bra seem to be in the wrong 


direction, is an operator. After all, feeding in any vector |w) will give us 
|u) (v||w) = (v|w) -|u), 


which is another vector. Here’s where the Dirac notation is helping us out — we can basically just “move bras and kets 


close to each other!” And in the proposition above, we're just Summing various operators. 


Proof. |t suffices to calculate (p|7|q) for all numbers p, q, and show that this quantity is indeed equal to Tpg. 


Plugging in the operator above, we get (moving the (p| inside the sum) 


do (el Li) Ty Gila), 


ij 


and now we can let the outside objects become bra-kets: 
= Yo bpiTiidja. 
ij 


since we have an orthonormal basis. And this is only nonzero if / = p and j = q, which indeed gives us Tpg as 


desired. 


Another way to think of this is that each term |/) Tj; (| corresponds to the (i, /)th entry of the matrix. So each of 
these terms is an individual entry, and the object |/) (j| is the matrix with a 1 in the (/,/)th place and zeros everywhere 


else. This presentation will be important to keep in mind going forward. 


Example 138 


Consider the operator |) (m]. 


Applying this operator to any vector |v) will give us an object proportional to |m), so this projects down to the 


subspace spanned by |m). If we call this operator P,,, notice that we also have the nice property 
Pz, = |m) (m| |m) (m| 


and the inner two terms just evaluate to 1 by orthonormality, so this is 


In particular, the matrix representation of this operator is a diagonal matrix with a single 1, so this has trace 1 — it’s 


a rank one projector. Similarly, the object 


Prn = |m) (m| + |n) (n| 


#1 


is a rank two projector, because it will give something proportional to |m) plus something proportional to |n). (This 
has two ones in the matrix representation, so the trace is 2.) But if we repeat this logic, eventually we get to the 


point where we go through all states: the operator 
[1) (A) +--+ + 1M) (A 


is just a diagonal matrix with a 1 on every diagonal entry — thus, we've found the identity matrix! So this is an 


important property: 


Proposition 139 


If |’) index an orthonormal basis, then 


This is a completeness relation, and tt might actually look familiar: recall that we wrote an arbitrary vector as 
v= Soli (ily). 
i 


Even though the interpretation of this is that v is the sum of numbers (/|v) times basis vectors |/), we can also think 
of the right hand side as the identity operator acting on v! So ambiguity of notation actually leads to nontrivial 


mathematical results here. 


Fact 140 
Often, we'll simplify an expression by introducing this “complete set of states.” We should keep this in mind when 


we're working through problems! 


Example 141 


il 0) 
Recall the two-dimensional vector space of spin states: since we have |+) = ] and |—) = "| we should have 


Indeed, writing things out in vector form, we have 


A [1 0} + 


which is indeed the identity matrix. 


The last basic mathematical idea we'll talk about is that of the adjoint operator. Recall that the defining property 
for an adjoint 7? is that 
(Ttu,v) =(u,Tv) => (Ttulv) = (ulTv). 


We can simplify the left hand side by flipping the arguments and taking a complex conjugate: this equation then 
simplifies to 
(v|Ttu)” = (ul T |v) = (v| TT u)* = (ul TIv), 


and now taking another complex conjugate tells us the defining bra-ket relation for adjoint operators: 


(v| TT |u) = (ul T |v)" } 
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This is useful, because it tells us what we derived earlier: to find the matrix elements of T', we take the matrix 
elements of 7, flipping the row and column, and complex conjugate them! Notably, if we look at the right side of this 
equation, we can rewrite It as 

(ulTv)* = (Tv|u). 


And now we can delete the us on both sides to conclude that 


(Ty| = (TI. 


This is another property of bra vectors, and in this case notice that the corresponding ket vector Tv is |Tv) = T |v). 


Fact 142 


In other words, to get the associated bra of T |v), we flip the whole object around, remembering to take the 


adjoint of 7 as well. 


In this notation, the Hermitian operators still satisfy Tt = T, so we have 
(Tulv) = {ulTv), 
which means that we can move a Hermitian operator between bras and kets freely. 


Proposition 143 


If T = |u) (v|, then Tt = |v) (ul. 


(This is an exercise that we can check on our own.) 

Now that we understand the flexibility of this notation (we can always go back to the conventional definitions of 
inner products and linear functions), we're ready to bring in the physics again with position and momentum states 
and the & and 6 operators, with the idea of a non-denumerable basis. (Everyone likes to use Dirac notation here, 
because it helps avoid confusion between similar objects.) 

The vector space we're talking about here is the state space, and we're going to introduce position states that 


look like |x). Intuitively, this corresponds to a particle at the position x, and now we have to be careful. For example, 


|ax) # |x), 


because the left hand side represents a particle at the coordinate position ax, while the right hand side represents a 
particle at the coordinate position x, but where the wavefunction has a different amplitude! Similarly, |—x) 4 — |x), 
and |x, + Xo) 4 |x1) +|xo) — the fundamental reason for the confusion here is that we're labeling with x, but our vectors 
aren't that directly connected to the x's — they’re wavefunctions! As another way to think about this, suppose we 
have a three-dimensional vector X. Then we have the ket |X), which corresponds to the particle at the position X. But 
our vector space isn't the real R® that x lives in — it’s the infinite-dimensional complex vector space of wavefunctions. 
So we should be very careful about labels when working in this kind of abstraction — hopefully the introduction of the 
bra vector is helping with this! 

So the reason we like this Dirac notation is that we can distinguish the number or coordinate x from the vector |x). 
The states |x) form a basis of our state space (where x € R, since we're working in a single dimension), and while 
the xs can be changed by real numbers, our states can be multiplied by complex numbers. So if we want to define 


our infinite dimensional vector space, we can't just make a list of our basis vectors — it’s a nondenumerable basis, 
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because there are uncountably many basis elements. So we'll use a slightly different inner product here: we'll have 
(xly) = 6(x — y), 
and to deal with normalizability issues, we need to manipulate what we did with finite-dimensional vector spaces before: 


= f dx) (x. 


To make sure we have the correct factor in front, let’s have this identity operator act on the vector |y): then 


instead of having / = >>; |/) (i|, we now have 


Iv) =") = f dxixy (ely) = f dx ix 5x) =v), 
as desired. And the basis states are position eigenstates: we can write the equation 
& |x) = x |x). 


So the & operator corresponds to the position observable — the eigenvalue for the state |x) is just x. 


So now we can make this a bit less abstract: if we have a particle in a state |), we can write the wave function as 
V(x) = (xl¥). 


This is the overlap between x and w, which tells us some complex number dependent on x. And with this knowledge, 


we can rewrite 
Iv) = 11) = f abe) (xl = f dx) ¥00, 


And we can interpret this equation and saying that our state |W) is a superposition of the basis states, and the weight 
of each basis state |x) in our sum is just the value of our wavefunction at that point x! And this helps us answer a 


slightly more general question: if we have two states @ and w, we can calculate their overlap via 


(ld) |= WI / dx |x) (xl) 


(putting a copy of the identity in between), and now we can bring everything inside to find 


= f dx (dle) (xiv) =| [ axa" Cows f 


Indeed, this is what we expect — the inner product on the state space comes from integrating the complex conjugate 
of one function against the other function. A 


Now if we try to compute a matrix element of the X operator, we'll put in another copy of the identity: 


(| |b) = ‘, dx (618 |x) (xlP) = 7 dx x (|x) (xlW) 


In other words, we're just putting an extra x into the integral here, which is again what we expect. 
To make this interesting, we'll introduce momentum states as well: these behave exactly the same way, so we'll 


just list some properties. The basis states are labeled by |p), where p € R, and we have the familiar equations 


(P'|p) = 6(p — p’), =f dplp) (el. B\p) = pp). 


So completeness and normalization work the same way: the only difference is that we now have to establish a relation 


between the x-basis and p-basis. The physical assumption here is that a particle with momentum p has the wave 
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function ; 
x|p) = Wp(x) = ——el*/", 


(Remember that the first equality here is established by the idea of an “overlap” itself.) So now if we want to compute 


something like (p|w) in terms of x-wavefunctions, we introduce another complete set of states: 


(ply) = / dx (p\x) (x|#) = = dxeP*/Pap(x), 


which is the Fourier transform of the wave function! To distinguish it from the usual wave function, we'll call it w(p) 
(it lives in the momentum space instead of the coordinate space). In other words, this is the wave function in our p 


basis. 


Example 144 
Now we're ready for a classic computation, where we have momentum operator states with a coordinate bra: how 


can we compute 


(x] |p)? 


This looks like the momentum operator acting on the wave function (that’s 6|qW))in the x-basis (that’s the (x| 


part). So we expect that it’ll be 1 4 W(x), but we can check this directly! To manipulate this expression, we need to 
figure out what to do with p. All we know about this operator is that it has momentum eigenstates, so we'll introduce 
a complete set of states: 


(xl B1p) = / dp (x|B|p) (ple) 


And now we can evaluate this a bit: 6|p) is just p|p), so this gives us 


z i dp(p (x|p)) (ply). 


So we don’t need to work too hard from here — the idea is that we can get a p to multiply the (x|p) = sean!” by 
hd 


applying > 4, to it. So this is just equal to 


2 / dpe o (x|p) (pl) , 


and now we can take the ag out of the integral because it’s a p-integral and there's only a single factor that depends 


on x — w itself doesn’t have an x dependence explicitly! So now our expression is just 


hd 


hd 
=a | dp (x\p) (pl) = > (xl) 


(last step by getting rid of the complete set of states), which is exactly what we claimed. But once we've seen this 
once, we don't need to repeat it again: the operator § is indeed what we expect. And if we want to do some more 


practice, we can derive the opposite relation 


(ol 8b) = in (0). 


14 February 24, 2020 


We're starting to talk about bra-kets now — let’s focus on projectors, and specifically the punchline we should keep 


in mind is that in a general projector, we project down onto a subspace U (so every vector is sent to U). But in an 
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orthogonal projector, we have an orthogonal subspace U+ which is sent to zero under the projection. 


Example 145 


Pick a line in the plane through the origin: this gives us a one-dimensional subspace U. Then the orthogonal 


subspace U+ is the perpendicular line to U, and U@ U" is the whole space. 


This means that any vector v in the plane, can be written as the sum of a vector in U and a vector in U+. What's 


nice about orthogonal projectors in particular is that the component in U+ will disappear when wwe project! 


Example 146 
Pick a plane U through the origin, which is a two-dimensional subspace of our three-dimensional space. One way 


to specify this plane is to specify a unit vector 7 orthogonal to the plane. Then we can define the plane via 


n-X=O = > mx, + mx. + 3x3 = 0. 


It's interesting that we're defining U, instead of U-+, to be the set of vectors that are “perpendicular” in some sense 


instead of U+, but hopefully this shows us some of this symmetry here. 


Remark 147. A few notes about this: subspaces have to go through the origin because every subspace has the zero 


vector. Also, the orthogonal subspace U+ is the scalar multiples of fi. 


Problem 148 


So now here's a challenge: how can we find the projector Py, and describe it as a 3 x 3 matrix? 


One strategy is that assuming (1m, 2) 4 (0,0) (the other case is easy because it’s just the projector onto the 


1 


xy-plane), we can construct an orthonormal basis X = (=no, m,0) and f = Ax X (the cross product). And 


Vf m+ns 
then 
Pv =(Xv)X+ WV v)V, 
1 0 0 
so the matrix P takes the form D = |0 1. OJ] in this orthonormal basis, and then we can do a change of basis 
0 0 0 


B-!DB, where B has columns 7, X, 7, to get the matrix in the standard basis. 


But there’s a simpler way: note that 


Py, Vv =v — Pav => Py, =1—|fA) (al, 


because we take the vector and subtract off the component projected onto the orthogonal subspace. (Another way 


to write this in more standard math notation is that P7 = 7— (V-fA)i7.) And this is just 


1 0 0 ny 1—n? -—mng —mn3 
0 1 O] —|m [mn No ns| =||—nyn 1-13 —nong 
001 n3 —nyng —NMong 1-73 


(an alternative way to arrive at this answer is to write out (V- A) in terms of v1, Vo, V3, 91, M2, 3, Or to use index 
notation to show that Pi; = 6) — njnj.) We can check that this has some of the properties we expect: for example, 


the trace is 3 — ne — ns - n3 = 2, which is indeed the trace of a two-dimensional projector (the rank of the space). 
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Remark 149. The main idea here is that if we know that Pv, is some specific linear combination a, V1 + a2V2 + a3V3, 
then a1, a2, a3 are specific elements we can write down in our matrix. But alternatively, we can just use some nice 


properties of projectors and calculate easily with bra-ket notation. 


Remember that we can write down projectors more generally in bra-ket notation: for example, if 6; and € form 
an orthonormal basis for U,, then 
Pu, = |€1) (E1| + |€2) (€2] - 


One important property to check is that our projector is Hermitian: indeed, because bras and kets flip, but we also 
look at a list of operators in reverse, projectors are indeed equal to their adjoint. 


And finally, we can think a bit more about completeness to bring everything together: 
|@1) (@1] + [€) (@2] + |) (Al = 1. 


And so indeed this is another way we could have arrived at the fact that the projector is just / — |7) (Al. 


15 Uncertainty Principle and Compatible Observables, Part 1 


Today, we're going to start talking about the uncertainty associated to a Hermitian operator. An important idea is 
that uncertainty is always measured relative to a state (mathematically, a deviation is always measured relative 
to some center point). So we'll always have some Hermitian operator A and some state w when we're making our 


arguments. 


Definition 150 


The expectation value of an operator A in a state w is 


(A)y = (blAy) = (, AY). 


The important point here is that this is always real, because the expectation value of a Hermitian operator is real 
(since (p, Ap) = (Ay, w) = (w, Ay)*). 
From this, how can we define an uncertainty? We need to make sure this quantity is zero at an eigenstate and 


nonzero otherwise. 


Definition 151 


The uncertainty of an operator A relative to a normalized state w is 


AA(H) = |(A— (A) YI. 


The idea is that this uncertainty should always be a nonnegative number, and the norm is a natural object that 
behaves in that way. In fact, the norm of a vector is only zero if we have the zero vector. So let’s check that we have 


what we want: if the uncertainty were zero, then we must have (A — (A)/)w = 0, which means that 
Ay —(A)l~=0 = Ap =(A)y. 


And this is an eigenvector equation with the eigenvalue (A), so we are indeed in an eigenstate when the uncertainty 


is zero. In fact, in such a state, 


(py, Ab) = (wh, (A)p) = (A) 


a 


because w is normalized, and thus the expectation value of A is indeed (A), the eigenvalue of this particular eigenstate. 
(So everything is nice and consistent!) 

Going back to our argument, if we're in an eigenstate, we indeed have the eigenvalue equation, so the vector 
(A — (A)!)p is zero, and thus the uncertainty is zero. So we do have an if and only if statement — this second fact, 
where the only vector with zero norm is the zero vector, is quite powerful. 


Squaring the definition, we'll find a formula that is more familiar and better for computations. Note that 
(AA(W))? = ((A— (A) Ip, (A — (A) IY) 


(the norm squared is the inner product of the vector with itself), and now we want to move (A — (A)) from one 
argument to the other. We should then take the adjoint of our operator, but A is Hermitian and (A) is a real number, 


so the Hermitian operator is equal to itself! This means that 
(AA(W))? = (, (A— (A)I)(A — (A) IY). 
And now we can simplify the operator in the right argument: expanding out gives us 


= (wp, (A® — A(A) — (A)A + (A)*)) 


(where we've dropped the identity operator for convenience of notation), and now we can expand out the terms 
one by one. The first term is (qj, A*w) = (A), and the second term gives us (pulling out the number inside) 
—(A)(w, Aw) = —(A)*. Similarly, the last two terms evaluate to —(A)? and +(A)? respectively, and putting this all 


together gives us the important formula: 


Proposition 152 


The uncertainty of an operator A can be expressed via the equation 


(AA(W))* = (A?) — (A)?. 


(This obviously has connections with standard deviation if we use the probabilistic interpretation!) And in particular, 
this tells us that 
(A?) > (A)?, 


because (AA(w))? is always nonnegative. 

We'll take a moment to give a geometrical interpretation of the uncertainty here — this isn't quite as well known 
as the rest of the discussion. We can imagine having a vector w and an operator A — if w is not an eigenstate of A, 
Aw will point in a different direction to w. If we think of the vector space Uy as being the one-dimensional vector 


space spanned by w, we can make two claims when we do an orthogonal projection of Ay down to Uy: 
+ The result of the orthogonal projector is (A) q. 
¢ The orthogonal component w_ has length equal to the uncertainty of A in the state w. 


In other words, the amount we’ve moved away from our Uy tells us about the uncertainty! To prove this, we can 


write our orthogonal projector as 


Py = |b) I, 


and then when we project our state Aw, we have 
Py(Av) = |v) (WI Ald) = |v) (A), 
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which is the first claim that we made. And the rest is pretty simple: the vector A|wW) — ||) (A) is orthogonal to y, 
because we can put a (| on the left to get (A) — (A) = 0, so this vector is indeed w,_. And now the norm of this 
perpendicular vector is the definition of our uncertainty! So the main point here is that our ideas from orthogonal 
projectors help us to understand uncertainty more pictorally. 


So now let’s do a computation as an example: 


Example 153 


Suppose we have a state |W) = |+) which is an eigenstate of S,. What is the uncertainty AS, in the state w? 


We know that If we're in an eigenstate of z, we're not in an eigenstate of x — in fact, we're in a superposition of 
two eigenstates of x. So there should be some amount of uncertainty — let’s try to figure out the quickest way to do 
the problem. 


Many times, we'll just want to use the formula 
(AA(W))? = (A*)y — (ADS. 


The expectation of S, in this state is 
(Sx) = (+] Sx |+), 


and we should expect this expectation to be zero (because there's an equal chance to be A and —f). But if we didn’t 


know that, the best way to do this is to use the matrix representation of S, (which we don’t need to know by heart): 


0 1 
writing out S, = 4 : ; and |+) = 


1 
, and now 
1 


sivealt |p 


Since we're taking the inner product of this by multiplying it with (+| on the left, we'll indeed get zero (by orthogonality). 


Remember, though, that this does not mean the actual uncertainty is zero — we have an (S)° term, which does 

: : : 2 : ; 
have some uncertainty value. Remember that Sy is a funny matrix: S2 = (2) !, which means that the expectation 
value of S2 is just (2) (because the identity always has expectation value 1 on any normalized state). So plugging 


everything back in, 
A\2 


N|o 


Now we're ready to state something more powerful: 


Theorem 154 (Uncertainty principle) 


For two Hermitian operators A, B and a normalized state wy, 


caap(asy > ((v|z1a.6i|v)). 


(where the uncertainties are taken relative to w). 


First of all, we have an inequality, so we need to first make sure that this number on the right side is real. 
(Otherwise, it doesn't make sense to compare the two sides.) And it's particularly confusing because there seems to 
be an j here. But the right idea is to focus on the operator +[A, B] = +(AB — BA). We know that the commutator 


[A, B] on its own is not going to give you a real number — for instance, [x, p] = /f. So that gives us a hint as to why 
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the / is important: the real idea is that the Hermitian conjugate 
1 t. 4 1 
Ga B}) == Ae BA)! = (BIA — At Br), 
and now because A and B are Hermitian, this is equal to 
= -=(BA — AB)= “IA Bl. 


So indeed this operator is equal to its adjoint, so the operator xIA, B] is Hermitian and thus has real expectation 
values! And now we don’t need to worry: the right side of the uncertainty equality is indeed a nonnegative real 


number, so everything makes sense. 


Remark 155. We can also take the square root of both sides above and write 
1 
AAAB > (v FA Bl ) : 
where the outer bars are just the ordinary absolute value. 


Either way, the uncertainty principle is nice because we've now defined uncertainties of an observable precisely: we 
don’t need to make an approximate order-of-magnitude statement. And this is an important result, so we need to 
prove it — not just because It’s mathematically better, but also because many interesting questions are based on the 
concept of reducing uncertainty. For example, if the two operators A and B both commute, the uncertainty principle 
tell us that the product is at least 0, but it doesn’t actually tell us outright whether we can get equality: we care when 
the uncertainty relation is saturated (that is, when we get the equality case), and we'll figure that out through our 
proof. 


First, though, we should mention the classic case: 


Example 156 


Consider the two operators A = X and B = 6. 


Then [A, B] = if, so the uncertainty principle tells us that 


(Ax)?(Ap)? > ((v in v)) | 


Simplifying, the is cancel and w is normalized, so we get 
n\2 
(ax)?(ao)? = (5) 
which is the classic result. The wave functions that saturate this (yielding equality) are the strange functions where 
w is actually an eigenstate of x (a totally localized particle) or of p (a totally delocalized particle). 


Proof of the uncertainty principle. The central idea of this proof is using the Schwarz inequality. We'll use two auxiliary 


variables here: 
If} =(A- (A) |), |g) = (B- (BDI) |v). 
By definition, we know that (AA)? = (f|f), and (AB)? = (g|g). Then Schwarz’'s inequality tells us that 


(AA)°(AB)* = (FIF) (alg) = | (fla) 7, 
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where saturation comes when the vectors f and g are parallel to each other. And now we can write this as 


(AA)?(AB)* > Re((f|g))* + Im((flg))* } 


since (f\g)? is Just Some number with a real and imaginary part. Now we can compute 
(flg) = (| (A— (A))(B — (B)) |¥), 
and we can find the inner operator by directly expanding: 


(A — (A))(B — (B)) = AB — (A)B — A)B) + (A)(B), 


and the expectation of each of these terms is (AB), —(A)(B), —(A)(B), and (A)(B), respectively (with the middle 


two terms, we pull out the constants first). And thus we end up with 
(flg) = (AB) — (A)(B), 
and similarly (so that we can find the real and imaginary parts) 
(glf) = (BA) — (A)(B), 


just by interchanging the roles of A and B. So now we plug back into the boxed equation above: we have that 
1 


Im (Fg) = 5: 


(fla) — (glf)), 


and now this simplifies very nicely: the product of expectations cancels, and we just end up with the commutator 
= (| [A, B] |v). And notice that this is exactly what we want — we can even toss the first term on the right side of 


the boxed equation above, because it’s always nonnegative. But let’s compute this for completeness: we have 


Re (fla) = 5 ((flg) + (alf)). 


and this is 5 of the anticommutator of the two operators A = A — (A) and B = B — (B). So at the end of the day, 


the Schwarz inequality gives us 


caay(asy? > ((v|21a.8i)¥)) +((v|2 4.89) | 


which is often called the generalized uncertainty principle. But to finish the proof, the second term on the right side 


is a nonnegative real number, so tossing it keeps the inequality, and we've arrived at our final result. (And in almost 


all physical examples, the second example is not useful.) 


So let's return now to the question of saturation. In order for the uncertainty inequality to actually be an equality, 
we need two conditions. First of all, we need the Schwarz inequality to be satured, so f and g must be states 
that are proportional to each other: |g) = @|f) for some complex number B. But we also need the second term 
((w|3{A, B}| wy) to be zero — that is, the real part of (f|g) is zero. In other words, we need 


(flg) + (glf) =0 => (FIBF) + (BFF) = (6 +B") (FIF) = 0. 


Since f does not have zero norm — there is some uncertainty in all of this — 6 + 6* = 0, which means B is a pure 


imaginary number. So the equality condition is actually nice: f and g are parallel with a purely imaginary constant, 
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and rewriting this out, our condition for saturation is that 
(B— (B)) |p) = iX(A — (A)) |) 


for some real number 2X. This is a somewhat weird equation to work with, but we'll often have, for instance, a 
differential equation between our two variables (such as in X and #) if we use the same coordinate representation for 
both operators. Solving for =, we can then find (B) and (A) and see whether that ansatz allows for a value A. But 


before we try too hard, let's take the norm of the equation above: we find that 
AB = |\|AA, 


_ 1 AB 
and thus A = LRA: 


In the remainder of this lecture, we'll talk about some related notions of uncertainty. We'll start with a handwavy 


motivation for energy-time uncertainty, just to get a picture of what's going on here. 


Example 157 


Suppose we detect a fluctuation of a waveform in time which suddenly turns on some waves and dies off after a 


while: the whole process lasts for a time T. 


In such a situation, we can try to count the number of full waves that we see: this will be 
wT 
om 
where of is the period of the wave. The problem is that the waves begin and end, so we can't really see the beginning 
or end: there's an uncertainty of order 1 in AN. And if there’s no uncertainty in 7, there's, in some sense, the 


uncertainty in w of 


T Qn 
—Aw=1 = Aw= —. 
2 7 T 
And now we can associate this with a quantum mechanical object: for a photon, for example, the uncertainty 
2uh 
AE =hAw => AE =. 


So now T is the amount of time it took for the photon to go through our detector: we saw the wave for some time 
T, and this is now related in some way to the uncertainty of our photon. And now there’s some kind of relationship 
between time and energy! 

But there's a delicate issue here: what exactly is “time uncertainty?” We can't really do this precisely, because 
there’s no Hermitian operator for which the eigenstates are times. Instead, we'll need to do something different — we'll 
use the current uncertainty principle with the Hamiltonian operator A = H, along with some operator B = Q which 


has no explicit time dependence. What we find, then, is that 
1 2 
janr(aoy > ((v|=1H.a]|¥)) 


And from here, we'll need an auxiliary result: this commutator actually has to do with the time derivative! (Even 


dQ 


Rt will still be nonzero.) Basically, we can write 


though Q has no explicit time-dependence 


(Q) = (, QY) 


and take the time-derivative of this expectation. Even though Q has no explicit dependence on t, it might depends on 
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x and p, which can change as the wavefunction evolves. So 


d(Q) 
dt 


= (FE, QU) + (v. Q32) 


by the product rule, where we've used that Q has no explicit time-dependence, and here’s where the Schrodinger 


equation comes in: 


a, 
ina = Hy, 
and plugging this in yields 
d 
QO = (Hib, QU) + (Wb. QA HY). 


The constants come out — one as its complex conjugate and one as normal — and we can send H to the other side 
because H is Hermitian. Thus, this Is 


i 


= WW, HOW) + =v, QHY) = “(wy (HQ — QHYv). 


oa | 


And now we've arrived at our result: 


Proposition 158 
For any operator Q with no explicit time-dependence, 


d 
dt 


(Q) = 5 WITH, QI). 


We'll see in a few lectures how to write this in the Heisenberg way as well. But the point is that we can plug this 


back into our uncertainty relation betwen H and Q: now we have 


lid 


canraay? = ((¥|2240a|v)) . 


and since we're taking norms of everything, we find that 


(AH)*(AQ)* > (2) (221) AHAQ > , 


2 dt ‘dt 


This isn't quite a time uncertainty relation yet, but now we just need to figure out some definitions: we can consider 


the quantity 
AQ 


~ |d(Q)/dt|' 


which has the units of time and can be physically interpreted as the amount of time it takes for (Q) to change by 


At 


some amount AQ — that is, it’s a measure of how much time is required for a significant change. If AQ is significant 


and comparable to Q, this is the time needed for significant change, and now we have that 
fi 
AHAt > 5" 


And this is the best kind of “time” we can get with our current uncertainty principle. We can make some complaints 
about this equation we've just written down — for example, At depends on which operator Q we're using, but we 
can try different Q's and get more precise results. And there's a version of the uncertainty principle which gives an 
alternative picture of all of this: if we have a state w which is an eigenstate, then nothing changes (it’s stationary). 
Indeed, in such a state, AH Is zero, and there is an “infinite” time At for things to change. But if we have a state that 


is not an eigenstate of energy, perhaps a superposition of two eigenstates at different energies, we can time-evolve our 
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state, and we can measure the changes of this state by seeing how long it takes for the state to become orthogonal 


to itself. It turns out that we can get another uncertainty principle out of that: 


Proposition 159 


Suppose At is the quickest possible time it takes for w(x, t) to become orthogonal to w(x, 0). Then 


h 
Mie 
= 


The next statement we'll make is a further comment on this “uncertainty of energy:” 


Proposition 160 


In an isolated system (that is, one where the Hamiltonian is time-independent), AE is constant. 


Proof. Take Q = H in Proposition 158 to find that 


d ! 
<(H) = + (OH, HI), 


and now [H, H] = 0 (any operator commutes with itself), so the right hand side is zero. So (E) is constant, and 


similarly if we take Q = H?, . 
Spey a F 2 
S (re) =" (ITH HID), 


and again H and H? commute with each other, so this is also zero. Thus 
d 


d 
(AH)? ae ((H?) = (H)?) = 0, 


because both terms on the right side are zero. Thus AH must be constant. 


This is useful for thinking about time-independent processes: 


Example 161 


Consider a decay (transition in an atom) which leads to photon radiation. Basically, an atom decays from an 


excited state to a ground state, and It shoots out a photon. 


The concept of energy uncertainty helps us organize our thoughts here: there is a typical lifetime 7, corresponding 
to the amount of time we need to wait for the excited state to decay. As this lifetime goes through, some observable 
Q changes a lot — for example, the position of the electron in our orbit, or its squared momentum, or some other 
quantity. So it is indeed reasonable to define a time uncertainty here relative to that Q, and we'll also have an energy 
uncertainty: we're in some combination of different states in this atom’s excited state, or else we'd be in a stationary 
state! So the dynamics are such that the interactions (for example) between the electron and nucleus, or with a 


radiation field, makes this state unstable and associates an uncertainty AE. So we get 
fi 
AET~ <=, 
2 


where AE is the “width” of our set of excited states. But later on, the particle goes to the ground state, so the particle 
no longer has any uncertainty: conservation of uncertainty means that our photon now has an uncertainty AE = hAw. 
This is related to the hyperfine transition of hydrogen — this is a situation where physicists get very lucky. We'll 


study later in this class that because of proton and electron spins in the hydrogen atom, energies actually split (due 
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to the magnetic dipole interaction), and we get a hyperfine splitting. Between the top and bottom states here, we 
get a photon with a wavelength of about 21 centimeters, which corresponds to a 1420 MHz frequency and an energy 
difference of 5.9 x 107° eV. 
But we shouldn’t apply this energy difference for our uncertainty principle — the uncertainty in energy comes 
from how broad the “energy width” of the top state looks, due to interactions. And it turns out that the lifetime 7 is 
about 10 million years, and this corresponds to an energy uncertainty AE which is extremely small: 
Ar 
= Sit 
r 

So the nice thing is that the 21 centimeter wavelength is easy to measure — energy-time uncertainty shows us that 


the gap between the bottom and top states will be pretty much exactly constant! 


16 February 26, 2020 


We're starting to talk about properties of our (state) vectors and the uncertainty principle. Remember that if we have 


a Hermitian operator A (corresponding to an observable), we can define the uncertainty relative to a state as 


AA(W) = |(A— (A) 1 |. 


The idea here is that we will measure our operator A a bunch of times, and this will give us some statistics (such as 
the mean and standard deviation). It’s important that the uncertainty is dependent on our state: for example, the 
uncertainty in the position depends on how “wide” our wavefunction is; it doesn’t make sense to just say “uncertainty 
in position.” 

Recall that if V is an eigenstate of A, we will always measure the same value — the eigenvalue of A corresponding 
to WV. Here, (A) is the expectation value of our measurement of A, and thus if AV, = AV, for some eigenvector VW, 
then 

A(V) =AVU => (VW, AV) = (WAV) = (A) =). 


So the eigenvalue will also be the expectation value for our operator in an eigenstate! This may explain the notation 
Aw = (A)yW. And remember that the expectation is only zero if the vector inside the norm on the right side is zero: 


this means (A — (A))WV = 0, which only occurs for an eigenstate. This gives us a nice characterization: 


Corollary 162 


The uncertainty of an operator at a state WV vanishes if and only if V is an eigenstate for the operator. 


A geometric interpretation of this is to think of our states again as vectors. Let UV be some state, and let Uy be 
the span of W (all scalar multiples of our original state). Then AW will be some other vector — it may not have unit 
length, so it need not be normalized. And then one interpretation of our result is that the orthogonal projection of 
AW into Uy is (A)WV, while the orthogonal part (AV), has length AA. 


Remark 163. Note that the collapse of the wavefunction doesn't have to do with this projection onto V unless V 


itself is an eigenstate (because the collapse of the wavefunction is probabilistic). 


There are some operators which are time-dependent, but generally we can “forget about time.” At any fixed time, 
we have a Hermitian operator, and we have a state, which allows us to calculate the uncertainty that time. But later 
in time, the state may have changed, which means the uncertainty also may change. Usually our operators A are 


time-independent, which makes our calculations easier, and that’s what we'll be doing for a while. 
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What do we do about degenerate eigenstates? It turns out that it doesn’t really matter: if e;, eo are two eigenstates 
with the same eigenvalue, any linear combination of those states is going to be an eigenstate, too. Remember 
that uncertainty is measured with respect to a given state, so the uncertainty can still be zero in a whole plane of 


eigenvectors! 


Example 164 


Let A be a Hermitian operator with two eigenstates: 


Alw1) =Ar|v1), Ale) = Az |e). 


Take an arbitrary superposition |w) = a, |W1) + a2 |W), and assume that it is normalized. What is the uncertainty 
of A with respect to w? 


We can use the formula (derived by squaring the definition of AA and then writing out the norm squared on the 
right hand side as an inner product) 


(AA(w))? = (A?) — (A)?. 


Remark 165. Before we get too lost in the algebra, note that if a, = 0 or a2 = 0, we have an eigenstate. This 
means AA(W) should be zero if a1 OF Q> vanish. Also, if the Xs are the same, the uncertainty should also be zero 


(because we always measure the same value). 


Remember that A takes on value A; with probability |ay|? and A> with probability |a2|? (where |az|? + |a2|? = 1), 
0) 
(A) = Jar|?A1 + lorg|? Az. 


1 0 mn mn 
(One way to do this is to think of w and we as basis vectors A and "| so we know that A = ' 


O;. : 
in this 


basis. Then we can find the expectation of w via WA.) Similarly, we find that 
(A2) = Jan |?A2 + Jao|?A3. 


So now we can calculate the uncertainty directly, but we can do some thinking first: we know that the uncertainty 
should vanish when aj, Q2, Az — Az are zero. But we also need to make sure the quantity is always nonnegative and 
real, SO a good guess Is 

AA(W) = lors ||or2||A1 — dal: 


(Remember that the uncertainty should have units of A.) At this point, we might just be off by a constant factor, but 


of course this is just a guess. Anyway, we can calculate now: 
(AA)? = Ajlar|? + A3|or2|? — (AZlar|* + A3|a2|* + 2A1A2|01|?|a3)) - 
Collecting terms, noting that |ay|* — Jay|* = Jaz|2(1 — Jax|?) = Jar|?|a2|*, 
(AA)? = Azar |*laal* + AZ|e2|*loul* — 2A1Az|on|* ora]? 


simplifies nicely to 
(AA)? = (Ai — Az)? |or1 |? |or2|?, 


and indeed our earlier guess is accurate! 
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17 Uncertainty Principle and Compatible Observables, Part 2 


Last time, we talked about the energy-time uncertainty relations, which tell us something about how fast the state 


can change. One interesting way to analyze this is to look at the inner product 


(p(0)|v(t)) = / dxwp"(t = 0, X)U(t, 2). 


In a sense, this tells us how quickly a state can change: at t = 0, this overlap is 1, and perhaps after some time 
the overlap is 0. (At that point, we can say that the state has changed a lot.) To make this quantity a bit easier 
to work with, we might as well take its squared norm |(~(0)|q(t))|?. If we assume that the system is governed by a 
time-independent Hamiltonian (which will help us prove the time-energy uncertainty relationship we established at 
the end of last lecture), we can consider the case in which w is some energy eigenstate. Such states evolve with a 


‘Ht/* so the overlap would remain equal to 1 for all times f. 


phase e— 

And in general, we can evaluate w(t) as a Taylor series in t — if we go to quadratic order, we'd find that this 
overlap only depends on things like AH and At. This kind of analysis has to do with quantum computation — in a 
quantum computer, we want to change states quickly, and these inequalities limit the speed of a quantum computer! 


Now that we've discussed a lot of properties of uncertainty, we'll do an example: 


Example 166 


Consider the Hamiltonian (for a one-dimensional particle) 


2 
yo + ax", 
2m 


where a > 0. 


We know the expectation value of the energy in the ground state, and we used the variational principle to find 
an upper bound on the ground state energy. We're going to use the uncertainty principle now to get a lower bound, 


so we have a window for the energy of the ground state. First of all, we know that 


(gre = ve + OG) agi 


and now we know that (x)gs = 0, because symmetric potentials have either symmetric or antisymmetric wavefunctions 


— it’s not antisymmetric because it’s a ground state. Similarly, the expectation of the momentum (p)gs = 0 as well, 


ho 
iy 


and this integrand is a total derivative (it’s aa of a constant times w?), so it evaluates to zero for a bound state where 


because we can imagine computing it: 


the value is zero at both ends. So now we can control the (p)gs term: we know that 
(Ap)* = (p*) — (p)?, 


so in the ground state, we know that (Ap)e. = (p*)gs, and now we can get a Ap into our expression. The main 
problem is that we need to deal with (x*), and now we're going to use the fact that (x*) > (x?)? (using the fact that 


(A?) > (A)? for the operator A = x), and thus we can get a Ax into our expression as well! So now 


Care 2 (Mees 
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where we've used that (x*) = (Ax)? in the ground state, and thus 


(Ap)2 (Ap)z 
= a + (x4) 95 > om 


(H) gs + a(Ax) és. 


And this is all good — we're on track to get a lower bound for (H) — and now we’re ready to use the uncertainty 
principle! Since ApAx > a in any state, this holds for the ground state, and thus Ap > af. Plugging this in, we find 


that 
2 


> 
* = 8m(Ax)2, 


And now we have an inequality, but we don’t know the actual value of Ax. Fortunately, if we minimize the right 


(H) + a(Ax)4 


gs° 


hand side over all Ax, we'll get a bound that’s true regardless of what Ax actually is! So 


hi 
nin | ——$== 
(H) gs = ee (smcnoz + a(x)§s) , 


and this is now just a calculus problem — we can take the derivative with respect to Ax and set it equal to O. It turns 
out that 4 + Bx* is minimized at x? = sts ae and the value of the function is 21/33 A?/3B1/3_ So plugging in 
the coefficients, we find our final answer: 


3 (hr 2/3 h./a 2/3 
(H) qs > 23> (=) ~ 0.4724 ( v) 


8 m 


This turns out to be an okay bound — the actual answer has a constant of 0.668, and the variational principle gave 
something like 0.69. But this gave us something, and the point is that this is completely rigorous! Sometimes 
the uncertainty principle is used to make a handwavy argument which is basically just dimensional analysis, but every 
inequality we've established here is exact. 

This concludes our initial discussion of uncertainty, and we're going to move on to a new topic now: diagonalization 
of operators. Essentially, suppose we have some operator which is important to us. To understand it better, we want 


to find an ideal basis so that the operator is as simple as possible in this basis. 


Definition 167 


An operator T is diagonalizable if there is a basis in which the matrix representation of T is diagonal (only the 


diagonal entries are allowed to be nonzero). 


Conceptually, suppose we have a diagonal matrix for T in some basis (u1,--- , Up): then the matrix action on our 
basis looks like 


T uj = Tit, 


but the only nonzero term here is the one where | = k, so this Is actually just equal to Tj;u;. This is some number 
times u; — call it A; — and now we know that Tuy = Ayu1, Tu = Asus, and so on, which means that the basis vectors 
are eigenvectors of our operator. But the logic goes both ways here — if we have a set of eigenvectors spanning the 


space, we can just pick that to be our basis. That gives us the following result: 


Proposition 168 


An operator is diagonalizable if and only if it has a set of eigenvectors that span the vector space. 
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Example 169 


0 0 
We know that : ] is not diagonalizable. 


This is because the characteristic equation is A? = 0, so the only eigenvalue is \ = 0. But the only eigenvectors 


ball -Ld- bl 


1 
which means we're forced to have b = 0. Thus there is only one dimension of eigenvectors (along A ), but we have 


there are of the form 


a two-dimensional vector space. So it's impossible to diagonalize this linear operator! 

So now we'll be a bit more concrete: say we have a vector space with some basis (vi,--- , Vp), and we have some 
linear operator T. Its matrix representation Tj;({v}) has no particular reason to be diagonal, and we want to figure 
out a concrete condition to understand whether we can change bases to make the matrix diagonal. Recall that we 
use an invertible linear operator A to change the vectors such that our new basis is ux = Av, (for all 1 < k <n). We 


proved that there is a relationship between the matrix elements in the two bases: we have 
T({u}) = AUT ({v})A, 


or more explicitly, the element 
Tii({u}) = Ane Tap ({V} Api 


where we're summing over p and k. So we want to find a matrix A where this works out, and an important idea at 
this point is that there are two different ways to think about diagonalization: one is that we're changing bases to 
make our operator diagonal in the new basis, and the other is that we’re finding a new operator such that A~!TA is 
diagonal in our original basis. We can write this out more explicitly: suppose that Tu; = A;u;, where the u; are our 
eigenvectors for T. (We're not summing over / here — this is a problem with notation.) Then to show that A7!TA is 


diagonal in our v; basis, we know that 


Tup= Au; TAv; = A;Avy => ATA; = XjV;. 


And now, indeed A7!TA has eigenvectors equal to our basis elements, so A !TA is diagonal in our v basis. 


Fact 170 


Notice that the columns of A are the eigenvectors of T. This is because ux = Avy, and A acting on vz gives us 


>> Aikvi. And indeed, this tells us that the Ath column of A should be ux, as long as we think of vy, as a column 


vector with a 1 in the Ath entry and a O everywhere else. 


Beyond the idea of diagonalization, though, we want to talk about a more relevant term for our Hermitian operators: 


Definition 171 


A matrix is unitarily diagonalizable if there exists an orthonormal basis of eigenvectors. 


This is a stronger condition than just being diagonalizable, and being able to achieve this is very good — this is 


because we've then broken down our vector space into basis spaces that are all orthonormal! Concretely, imagine that 
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we start with some orthonormal basis {v}: we can then pass to some other orthonormal basis {u} with some operator 
(analogous to the A above), and remember that we achieve this with a unitary operator. So a matrix that is unitarily 
diagonalizable must be of the form 

T({u}) = UT ({v})U 


for a unitary operator U — in words, this means there is a unitary operator which takes us to the “privileged basis” 
in which our operator is diagonalizable. 

And the main theorem of this subject is one of the most important theorems of linear algebra — we can characterize 
the set of operators for which we can have an orthonormal basis of eigenvectors. It turns out that Hermitian operators 
are indeed unitarily diagonalizable, but that’s not actually the complete result. Here’s the class of operators we care 


about: 


Definition 172 


An operator M is normal if [M', M] = 0. 


So Hermitian operators are normal, because M* and M are the same matrix. Similarly, anti-Hermitian operators 
are also normal, because Mi is —M, and unitary operators are normal because U'U and UU? are both the identity 


matrix. So many of the nice classes of operators we've been talking about are all normal! 


Proposition 173 
If M is a normal operator, and |w) is an eigenvector of M with eigenvalue \ € C, then Mw is an eigenvalue of 


Mi with eigenvalue A*. 


The usual strategy for proving something like this is to show that (Mt — *)w is the zero vector, because it has 


zero norm. And with this, we're ready to get to the result that we've been working towards: 


Theorem 174 (Spectral theorem) 


Let M be an operator in a complex vector space V. Then V has an orthonormal basis of eigenvectors if and only 


if M is normal. 


To prove this, we need to prove that any unitary diagonalizable operator is normal, and also that any normal 


operator can be unitarily diagonalized. 


Proof sketch. Suppose M is unitarily diagonalizable. Then there is a unitary operator U such that U'MU = Dy for 
some diagonal matrix Dy, which means that 
M = UDyU. 


Therefore, 
Mt = (UDyUt)t = UD. 


Now to check that the matrix is normal, we need to check that the computator is zero: 
Mi M = UD), DyUt, 
where we've used that the middle UtU are just the identity matrix, and 


MMt = UDyDi,Ut. 


90 


But these are the same, because any two diagonal matrices commute — we multiply elements along the diagonal! So 
indeed M is normal. 

The other part of the proof — showing that a normal operator Is unitarily diagonalizable — is done by induction, and 
the idea is that any matrix in a complex vector space has at least one eigenvalue with a corresponding eigenvector. 
Use that eigenvector as our first basis vector, and show that we can reduce the matrix so that there are zeros in the 
first row and column — now do this again step by step with the remaining smaller matrix. The point is that normality 


allows us to show that we can indeed get zeros in the off-diagonal entries, and eventually we'll have a diagonal matrix 


as desired. (It’s good for us to read through the proof, because it will make a lot of what’s going on more clear!) 


Our final topic of this lecture is simultaneous diagonalization, and we're going to focus on Hermitian operators 
from here on out. This is one of the most important ideas in quantum mechanics, because it’s what allows us to label 
and understand a state system! For example, if we have a set of energy eigenstates, but there is a degeneracy in the 
eigenvalues, we might have a lot of states with the same energy. And we need to be able to distinguish these states — 
they are different, or else they'd be the same state — so there is likely some other physical property corresponding to 
some other operator, and we'll want to simultaneously diagonalize these two operators so that we can characterize 


with the two properties at the same time. 


Definition 175 


Two linear operators S and T are simultaneously diagonalizable if there is a basis in which every basis vector is 


an eigenstate of both S and T. 


Proposition 176 


If S and T are simultaneously diagonalizable, S and T must commute. 


Proof. The fact that two operators commute (or don't) is a basis-independent statement. If S and T are simulta- 


neously diagonalizable, there is a basis in which both S and T have diagonal matrices — since diagonal matrices always 


commute, S and T must commute in any basis. 


This is not a sufficient statement, though — not every matrix is diagonalizable. But we do know that normal 


operators are diagonalizable, and now we can make a plausible claim: 


Theorem 177 


If S and T are commuting Hermitian operators, then they can be simultaneously diagonalized. 


This result is easy to prove in the case where there are no degeneracies. Remember that a degenerate spectrum 
is a situation where an eigenvalue is repeated — then we have three different cases. Either (1) both are non-degenerate, 
(2) one is non-degenerate, or (3) both are degenerate. We can prove cases (1) and (2) together, and we'll do that 


first: suppose that the operator with a non-degenerate spectrum is T. 


Proof when T's spectrum is non-degenerate. In this case, there exists an orthonormal basis (uz,--- , Un) by the spec- 
tral theorem, such that 
T uj = AU; 


for all /, and A; ¥ A; for all / Aj. So now each of the u; eigenvectors generate one-dimensional invariant subspaces: 


we now want to know what happens to these ujs under S. Note that 


ST uj = Aj SU; i 
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but S and T commute, so we also have 


STu; =| TSu; |. 


So the vector Su; must belong to the invariant subspace for u;, because it’s an eigenvector of T with eigenvalue A; (and 
there’s only one dimension of eigenvalues for which this is true)! So Su; = w,u; for some w;, and indeed we've shown 


that u; is also an eigenvector of S (possibly with a different eigenvalue). So eigenstates of 7 are also eigenstates 


of S, and we've indeed shown that S and T have a common set of eigenvectors, as desired. 
The other case is a bit more interesting: 


Proof when operators have degeneracies. |f S has degeneracies, we will have eigenstates that generate subspaces with 
dimension larger than 1. Let U, denote the set of all vectors such that Su = Axu, and say that this space Ux, has 
dimension d,. In other words, the kth eigenvalue has a corresponding space U, such that the entire space Is getting 


scaled by the same amount A,. And our vector space can be decomposed as 
V=U, 8 U2 @---PBUm, 


where all of these subspaces U; have different dimensions — some might have no degeneracy, while others have 
degeneracy three. 
Regardless, we can denote the basis of eigenvectors for U, as (u, ubk) rey uP) — by the spectral theorem, this 


is an orthonormal basis. So we have a basis for V by putting all of these basis elements together: we can thus say that 


(1) 


(u’ uD ul, 6, yl) 


(1) 
Acar gay. US Ug 
is a basis for V. And we know that S is indeed diagonal in this basis, because every vector is an eigenvector of S by 
construction — the first d; diagonal entries are A1, the next d> are Az, and so on. 


So this is a good basis, but there’s another basis that also works well: we can consider the basis 
(Yu, Yu), hae va 22: Vek, Vn us”), a2 Raa Gap 


where Vi,--- ,Vj are some arbitrary unitary operators acting on the spaces U;,---,Um. Basically, we take the 
subspace U, and act with a unitary operator V, on it, take the subspace U> and act with a unitary operator V> on it, 
and so on. Because our operators are unitary, we still have an orthonormal basis for each U;, and thus we still have 
an orthonormal basis of V here (because the different spaces with different eigenvalues are already orthogonal)! 

So now here’s the catch: the spaces U, are S-invariant subspaces, and we want to show that they are also 


T-invariant! Suppose that u € U,: then 
S(Tu) = T(Su) = dx (Tu), 


so the vector Tu also has eigenvalue Ax. Since we defined Ux, to be the space of eigenvectors with eigenvalue Ax, 


De O. ae G 
Deck ~H 
Tu € Ux as well! So the idea now is that T keeps the invariant subspaces: it’s in the block form | . ane Filles 
Oo. 404 aig: De 


where Dj is a d; by dj block matrix. Right now, we haven't simultaneously diagonalized T yet, but now we can take 
those arbitrary unitary operators V; that we defined above. Since T is Hermitian, it’s also Hermitian on each diagonal 


block matrix, so we can diagonalize each block: those are the Vis that we use! So once we do this for each block 


matrix, we can diagonalize T without destroying the diagonalization of S, as desired. 
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This argument also extends to an arbitrary number of operators: if S1,--- , S, are all mutually commuting Hermitian 


operators, then we can simultaneously diagonalize them. 


18 March 2, 2020 


There's a test next week, Wednesday night at 7:30 (in our regular recitation room). MIT regulations require lec- 
ture/recitations to be canceled on the day of an exam, so we will not have class on Wednesday, March 11. It should 
take about an hour and a half long, but the room is booked for two hours. 

We'll get a formula sheet with main results — a good way to start studying Is to read over the formulas and make 
sure we understand everything there. The exam itself will have some wave mechanics, spin 1/2, uncertainty, and some 
(about a third) linear algebra. 

Today, we'll discuss operators, bras and kets, and other related topics. We'll start a real treatment of time evolution 
soon — next lecture, we'll see an argument why having a unitary operator that evolves states implies the Schrodinger 


equation. 


Example 178 


Consider a time-independent Hamiltonian H: that means that there is no explicit t. Then how can we solve the 


Schrodinger equation ince = HW? (Note that WV denotes “full” wavefunctions in terms of both x and t.) 


The idea is that we can write the energy eigenstates in the separable form 


W(x, t) = (Re Eth 


as long as Hw = Ew. The time-independent phase e~/&*/? 


only comes up in the final form of the solution: it does 
not directly affect the w(x) component! And now this means that if we can write our initial wavefunction as a linear 


combination of energy eigenstates 
V(x, 0) = So anPn(x),  Hbn(x) = Eva(x), 
n 
then we can just evolve the “basic solutions” individually: 
W(x, t) = S- OnWn(x)e[Ent/P 
n 


—iHt/® which does the time evolution for us. This 


Another way to say this is that we have a unitary operator U(t) = e 
operator has the property that 
W(t) = U(t)V(t = 0). 


-iHt/h 


Specifically, what’s happening is that when we have e acting on our eigenstate w,(x), H always gives Ep. So 


the exponential of H hitting w, just gives the exponential of E, instead! 


Remark 179. Things will get a bit more sophisticated starting next lecture, where we'll consider the time-evolution 
operator in general. If H depends on time, this can look much more complicated. But if H depends on time, but H 
commutes at different times, we have a simple generalized formula. (And if H rotates from one component of spin to 


another or does something else crazy, the formula is just messy.) 


Let's discuss a little now about some of the diagonalization arguments we've been making. There’s a few points 


to pay attention to for the Spectral Theorem: 
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* The theorem is usually stated for Hermitian operators, but it’s more generally true for normal operators: that 
is, operators M such that [M', M] = 0. 


* The result says that there is an orthonormal basis of eigenvectors, which actually tells us a lot: it gives 
as many eigenvectors as the dimension of our vector space. In addition, this basis has very nice properties, 
because all of the eigenvectors are perpendicular. (Note that there isn't a unique orthonormal basis, because of 
degeneracies in eigenvalues. For example, if we have a plane of eigenvectors with the same eigenvalue, then we 


can pick any two orthonormal vectors in that plane.) 


We've probably heard the phrase unitarily diagonalizable: what this really means is that if we start with a normal 
(but not diagonal) operator M on an orthonormal basis (€1,--- , @,), we can apply a unitary transformation to turn 


our basis into (€,--- , &,), in which M is diagonal: 
UtMU = Dy. 


And it makes sense that we want a unitary operator U: such operators are those that preserve the inner product. 


Fact 180 


By the way, we know that any operator has at least one eigenvector: this is because the eigenvalues » of an 


operator satisfy 
det |M — Al| = 0. 


This gives a degree n polynomial equation in A, and the Fundamental Theorem of Algebra says that this always 


has a solution. 


This is particularly important, because we show the Spectral Theorem by induction, extracting one eigenvalue at a 


time and showing that we can diagonalize the resulting matrix. 


Problem 181 


Is the matrix M = diagonalizable? 


Because the determinant of M — X/ is just (1 — A)°, the only eigenvalue of this matrix is 1. But then if we try to 


solve the equation v = Mv, 


V4 Vy + Vo + v3 
v= |v] = Vo + v3 = Mv 
V3 V3 


only has solutions where v2 = v3 = 0, so we only have one (set of) eigenvector(s). Basically, what's going on is that 
we have a “shear matrix,” so things aren’t being scaled in the same way as they would in (for example) a Hermitian 
operator. 


Another way to think of this is that if we had three eigenvalues of 1, we would want to diagonalize our matrix 


1 0 0 
into ]O 1 OJ], the identity matrix. But this is definitely not going to happen, because for any unitary operator, 
0 0 1 


Ut/U = UtU = U-1U = | is just the identity matrix, so it can’t turn into M. 


We'll finish with some exercises involving bra-ket notation: 
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Problem 182 


Consider the operator T = |u) (v|. What is the operator TT, and what is the trace of T? 


We know that T applied to any vector will give a clu) for some complex number c. (Specifically, we have 


T |w) = |u) (v|w)). Thus, this is a projection onto the space spanned by |v): the only nonzero eigenvalue is 


T |u) = |u) (vi fu) => A=] (v,u)} 


so that is also the trace (sum of the eigenvalues) of 7. Another way is to write 
t(T) = STi = DOT) = Diu) (vii), 
i 
which we then rewrite as 


dvi) ilu) = (v| » li) (| [u) = (v| [u) = (v, u). 


Bra-ket notation helps us find the adjoint as well: 
T |w) =u) (vlw) => (w| TT = (ul (lv) 


(because (v|w) is just a complex number, so the adjoint turns it into its conjugate). And now we can just move things 


around: 
(w| Tt = (wiv) (ul, 


so we have a simple form| 7? = |v) (u| |. 


19 Quantum Dynamics 


In a lot of our study so far, we're working with a vector space of states, and our states are wavefunctions. But there's 
been no time in this vector space, but we do care about time in physics because we have clocks! So we can wait some 
time, and we will see that our vector has moved to some other vector in our state space. So this is the concept of 
dynamics in quantum mechanics: we need a picture to describe time evolution. 

A picture to keep in mind is that we have a vector space H (for Hilbert space), and we'll have some state |, to) at 
time to. Then |w, t) is some other state in our Hilbert space, but it should definitely have unit length if we normalize 
our states. So we can think of having a unit sphere on which all our vector tips live, and then our vector moves in 
time and traces out a trajectory, all while preserving the norm of the vector. (And if we don't use a normalized 
vector, we'll still preserve the norm — it'll just be a different value from 1.) We proved earlier on that an operator 


which preserves the length of all vectors is a unitary operator, and now we're going to make a physical postulate: 


Proposition 183 


The state |w, t) is obtained by the action of a unitary operator from the state |, to): 


|p, t) = U(E, to) |, to) . 


Here, |q, to) is some arbitrary state, so if we use some other arbitrary starting state |’, to), it also evolves with 
this formula, and the unitary operator U is the same for all states! If we give this unitary operator U any state, it'll 


tell us how it evolves in time. 
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This is actually a very big assumption — it turns out that this postulate already gives us the Schrodinger equation! 
We'll see that shortly. 


Proposition 184 


The unitary operator U(t, to) is unique (if it exists). 


Proof. \f two operators do the same thing to all vectors in our vector space, then they are the same operator. 


We also know that if U(t, to) is a unitary operator, we have 
(U(t, to))'(U(E, to)) = I. 


This notation is a bit cumbersome, so we'll just write (U(t, to))' as U'(t, to): it means the same thing. 


We can now establish a few important properties of this operator U: 


+ U(to, to) = /: just plug in t = to in our defining equation, and the only operator that leaves all states the same 


is the identity operator. 


« Since we have 
|W, to) = U(te, tr) |W, ti) = U(te, t)U(tr, to) |W, to) 


but also 


|W, to) = U(tz, to) |W, to), 


we must have | U(ts, to) = U(te, t1)U(t, to) |. In other words, time composition works like matrix multiplication: 


we go from tg to ty, then from t; to fe. 
¢ If we take to = to in the above equation, we find that 
I = U(to, to) = U(to, t1)U(t, to). 
Thus U(to, t) and U(t, to) are inverses for all t, and we can also write this as 
O*( 6, to) = U (tot) = 0" a) 


(because U is unitary). This means that we can delete the “inverse” or “dagger” from our operator by just flipping 


the order of the arguments. 


With this, we're now ready to find the Schrodinger equation. Since the Schrodinger equation is a differential 


equation, we'll try to use time evolution. We have that 


fo) 6) 

—|y,t)= (| —U(t, t , to), 

ela t) = ( SeUCeta)) Ist 
where we only differentiate the operator because |w, to) is a fixed state independent of t. We want to get an equation 
for |, t) out of this, and we have |w, to) on the right side of the equation, so we can rewrite this as 


_ OU(t, to) 


— ap U (to. t) |W, t). 


So we now have a complicated operator acting on |w, t), but we can rewrite this as 


= Et) UN (t,t) ft) 


t 
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(so that we have t and to in the same order), and we'll call this operator A(t, to). So now we have 


Flat) = Alt, to) fet), 


We want to learn some important properties of A at this point, so that we can turn this more into the Schrodinger 


equation: 


Lemma 185 


The operator A is anti-Hermitian. 


Proof. Note that (to apply a dagger, we reverse all the operators in a product and put daggers on them) 
aut 
AT = Ut, t)——(t = tp). 
(t, to) S(t — to) 


because the time-derivative doesn’t interfere with daggers. To justify this last point, we're taking an operator at 
two slightly different times and subtracting them — since At — Bt = (A— B)!, the dagger indeed goes through the 


derivative. And now this is —A, and we can show that by noting that 
U(t, to) U'(t, to) = 1, 
and now differentiating this with respect to time: the product rule tells us that 


OU aut 
art to)U'(t, to) + U(t, to) Sy Ct to) =0 => A+AN'=0, 


as desired. 


The next point of business is to get rid of the to in A: 


Proposition 186 


The operator A is independent of tg: that is, A(t, to) = A(t, tr) for any to, fr. 


What this allows us to do Is to take t; = to + € and take the limit as € — 0, which means the derivative is zero 


everywhere — that means that A is indeed absolutely independent of the second argument. 


Proof. We know that 


OU(t, to) 
ot 


OU(t, to) 


A(t, to) = ry 


U'(t, to) = U(to, t1)U' (to, tr) UT(t, to), 


where we've introduced an identity operator between the two terms. But now we can group the first two terms 
together: even though the derivative only acts on the first term, it’s also okay for it to act on the first two terms 


(because there’s no time dependence anyway). And similarly, we can rewrite the last two terms: 
) 
= 5-(U(E, to)U( to, tr) UC th, to)U(to, t). 
Now by composition, this is equal to 


fo) 0 
= 5p ult: t,)U(t,, t) = ptt t,)U'(t, t,) = A(t, tr) 


as desired. 
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So now we have an operator A which is anti-Hermitian and only depends on t: we'll multiply it by / to make this 


= 


ime (because unitary operators U have no units and we take a 


into a Hermitian operator. Also, since A of units of 
single time derivative), we can actually replace A by /fA to get a Hermitian operator with units of energy. And now 


there isn't much more we have to do: if we define H = ifA 


Flv. t) =A) t) > | inh) = H(t) Wy. 2) 


and we've derived the Schrodinger equation! This is basically most of the information in the Schrodinger equation: 


it’s just unitary time evolution. 


Fact 187 


There's a clear correspondence between the operator A and the Poisson brackets from classical mechanics, which 


we should read about if we're curious. 


When we want to invent a quantum system, we don’t really know how to find the operator U, but we do know 
how to find the Hamiltonian H from U: it’s just int, to)U'(t, to). And the Hamiltonians are nice — we know energy 
functionals of systems, so often we can write down an explicit H. 

But we should also think about the opposite problem: how can we get U from H? It’s easier to invent a quantum 
system with H, but we do care about how the unitary evolution operator looks. 


To do that, first multiply both sides of the defining equation of H by U: we have 
OU 
ina (t. to) = H(t)U(t, to) 


(where the Ut and U terms cancel on the right hand side above, and then we've switched around the two sides). 


There's no confusion with derivatives between partial and total derivatives, so we have 
_d 
in Ut, to) = H(t)U(t, to). 


And we should be able to see the Schrodinger equation in here: if we put in a |, to), the right side becomes H(t) 
acting on |p(t)), and the left hand side becomes if |w(t)) (because we can bring |, to) into the derivative). To 


solve further, there are three cases here: 


+ In our first case, H is time-independent, so H(t) = H for some operator H. Then 


and we'll try to write down a solution of the form U = e~'4*/fUy. Plugging this in, we find that 


dU iH 
ji Si | = je Pe. 
dt ( h ) 2 
Here, we've used the fact that H doesn't depend on time — H acts like a number, so the derivative just lets us 
take the H out of the exponential (for example, we can imagine taking the power series expansion). So canceling 


constant terms, we find that the left side of our boxed equation above is 


dU 
‘A =H —iHt/h 
i ae e Uo, 


98 


which is exactly HU as desired! So we know that our unitary operator 
Ulsty) =e Oy 


for some constant matrix Up. Plugging in t = to, the operator should be the unit matrix, so 1 = e~!4#%/"Up, and 


thus Up = e!%/". Substituting everything back, we get our final answer: 


U(t, to) = e7 Ft) | 


as long as H is time-independent. And if we have U act on any energy eigenstate (which is an eigenstate of 


H), we can just substitute in the eigenvalue E: that is, 
ea |Wn) = een |Wn) 


as long as H|Wn) = En|Wn). 


In our second case, H has a little bit of time-dependence: we design this case so that it’s still possible to solve 
the equation. We'll assume that 
[A(t1), A(t2)] = 0 


for all ty, to (that is, the Hamiltonians at different times always commute). For example, a particle in a magnetic 
spin has H = —yB(t) - &, and it’s possible to have a time-dependent magnetic field B(t). But if the direction 
is fixed, so we have something like H = —yB,(t)S,, then the Hamiltonians at different times will commute 
because S, commutes with itself. (But later in the class, we'll do things like nuclear magnetic resonance, and 
then the system is more complicated than this.) 


Well, the claim we have in this case is that U(t, to) ends up being something generalized from the above case. 


We want to put e~//" like before, but the time derivative a isn't quite so simple now because H does depend 


on time. So we can fix this by trying an ansatz of 


U(t, to) = exp 5 [meyer 


(Notice that if H is time-independent, this reduces to the boxed equation for U(t, to) in the first case.) To verify 


this, we'll call the expression inside the exponential R(t). We have 
R(t) = — H(t) 


by the fundamental theorem of calculus, and now we want to differentiate 


dU 1 1 
R | | | Veve ea 
Use ag (1+R4 aR The ), 


and this simplifies to 


dU 
dt 


fe Si ee 
=R+5(RR+RR)+ 3 (RRR+RRR+ RRR) ++ 


but now R commutes with R, because R depends on H, while R is an integral of H's — we're assuming that 
the Hs at different times commute. So the expression simplifies by moving all of the Rs to the left, and we just 


end up with 
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which means the unitary operator U that we've established is the correct one for time-evolution, as long as the 


Hs commute at different times. 


+ In the general case, the idea is that R and R may not commute with each other. There's not very much we can 


do, but there is one way to get something that makes sense. 


Our answer will look like 


at 
U(t, to) = T exp -5f ae'H(e)| ; 
to 


where T is defined to be the time-ordered exponential. Basically, we expand the exponential term by term: we 


start with the usual expansion, but then we change the limits of integration: 


; t -\ 2 t th 
=1+ (-5) / dt, H(t) + 7 (-;) | de H(t) f dtoH(te)+---, 
fi to 2 fi fs 6 


where the idea Is that the second integral in the “squared” term always has t; > to. The next term will be similar: 


it'll look like 
1 j 3 t ty to 
“7 (-i) foattie) [ dtt(e) [ atsn(ts) 
3! h to to to 


and we can check on our own that the time-derivative works exactly as it should: specifically, if this time-ordered 
exponential is our U, then ing will end up being equal to HU. 


So it's reassuring that a solution exists, but this isn't a very practical way to find a solution U. And when we do 
the rotating magnetic field problem for magnetic resonance, this isn’t what we'll be doing! But we'll see a bit 
more of this in 8.06. 


And now we're ready for an alternate formulation of all of this, known as the Heisenberg picture of quantum 
mechanics. This isn’t something that we formulate on its own — the idea is to start with a Schrodinger picture in 
which we've already defined all of our operators x, P, S, Al and wave functions w, and we'll think about them in a 
new way. 

To get started, we first consider a Schrodinger operator As (S stands for Schrodinger here). The motivation for 
the Heisenberg picture is that we have two independent time-dependent states |a, t) and |G, t), and we might want 
to understand the quantity 

(at, t| As |6, t). 


We know, though, that we can represent the bra and ket vectors here by using unitary operators: 
= (a, 0| U'(t, 0)A,U(t, 0) |B, 0) . 


So instead of having time-dependence in a and GB, we can use the unitary operators to say that we have a time-dependent 


operator Ut(t,0)A,U(t, 0) between the initial states. And this object is very important: 


Definition 188 


The Heisenberg version of the Schrodinger operator As is 


Ax(t) = Ul(t, 0)AsU(t, 0). 


So any Schrodinger operator corresponds to a Heisenberg operator — we just act with U from both the left and the 
right (which is the natural way for operators to act on operators). There’s a lot of things we can say about this new 


operator that we've established: 
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- At t=O, Ax(t =0)= As. This is because U(t, 0) is the unitary time-evolution operator, and when t = 0, this 
is just the identity — nothing is changing. So our two operators (Schrodinger and Heisenberg) start off being 


exactly the same. 


¢ The unit operator in the Heisenberg picture is 
U'(t, 0)/U(t, 0), 


but / doesn’t do anything and Ut and U multiply to the identity, and indeed the unit operator doesn’t change 


in the Heisenberg picture. 


* Suppose Cs= AsBs. Then we can find the Heisenberg operator for C 


Cy |= UICsU = UTAsBsU = UTASUUtBsU, 


and now this is just the product of the Heisenberg operators AyBuy |! Similarly, this tells us that commutators 


also behave nicely: 


Cs =[As, Bs] => Cu = [An, Bull. 
The key thing to keep in mind is that 

[x, p] = (hl => [xn(t), pa(t)] = shl 
(because /fi is just a constant, and the unit operator stays the same in the Heisenberg picture). So any 
commutation relation in Schrodinger is also a commutation relation in Heisenberg. 


+ Let’s now look at Hamiltonians: by definition, we have 
Ay(t) = U'(t, 0)AsU(t, 0). 


If the Schrodinger Hamiltonian commutes at all times, meaning that [Hs(t1), Hs(t2)] = 0 for all t), to, then the 
unitary operator is built by an exponential in terms of H. But then we can move the U'(t,0) past the Hs, and 
we find that 

A(t) = AsUt(t, 0)U(t, 0) = As(t); 


that is, the Schrodinger and Heisenberg Hamiltonians are equal if the Hamiltonians commute at all times. 


(We'll be able to check this in a nice example as well!) 


Note that whenever Hs(t) is a function of x, p, and t (for example), we can turn it into a Heisenberg operator by 
putting a U7 on the left and a U on the right. But then the Us will work its way inside — x's become Heisenberg 


x’s, and so on. So what we're claiming is that 
A(t) = U'As(&s, Bs, t)U = s(n, Bn, t), 


which means that we get the Heisenberg Hamiltonian by replacing the Schrodinger variables with their 


Heisenberg versions. So if we’re in the case above where the Hamiltonians commute at all times, then 
Hs(&n, Bu, t) = Hs (Xs, Bs, t) : 


somehow putting in Heisenberg operators into the Schrodinger Hamiltonian gives us exactly the same thing. 


This will be a useful identity, and we'll use it later on! 
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* Our last point is about expectation values: remember that we started this discussion with two arbitrary states 
|a, t) and |G, t). If we set those states equal to w, we find that 


(p, t|As |W, t) = (wp, 0| An(t) |W, 0) | 


This is an equation which tells us that the Schrodinger operator’s expectation value at a time ft is just the 


Heisenberg operator’s expectation value at time 0, and we can write this schematically as 
(As) = (An). 


We just have to be careful to understand how to interpret this equation: in the left side, we’re using time- 


independent states, but in the right side, we're using the t = 0 states. 


So far, these Heisenberg operators are a little bit difficult to work with — they're hard to calculate, so we want to 
find an equation satisfied by the Heisenberg operator. The reason for this is that we seldom know U, and even when 
we know it, it’s a bit difficult to do the simplification UtAcU. 

The idea is to calculate the quantity 


d » 
—A 
deo 


which is also equal to ihe (UtAsU). We should remember that the Schrodinger operator can have some explicit time 


ih 


dependence, so we should apply the product rule to all three terms: this is equal to 


aut. ~ OU OAs 
in A AsU + inUtAs a + inU ai U, 


where U is always U(t, to). 


Remark 189. /t may be confusing why we have partial derivatives in one expression and total derivatives in the other 
— the important thing to keep in mind is whether we're fixing the Heisenberg or the Schrodinger variables. For the first 
two terms in the product rule, it doesn’t matter whether we use a partial or total derivative — in both cases it's the 
same — but we need to use the partial derivative for the last term so that we're fixing Schrodinger observables, while 
we need to take the total derivative for the initial expression because Ax is written in terms of Heisenberg variables, 


which can have some additional time dependence (and we want to fix Schrodinger variables throughout everything). 


But we also know how to find the derivatives of U and U': since ine = HU, we also know that (taking the dagger 


of that equation and moving the negative sign over) ine = —U'Hs. So plugging those in, we find that 
cae eK de ah aA 
in An = —UtHsAsU + UTAsHsU + ere 


(The last term will be 0 if As doesn't have any explicit time dependence, so we'll just leave it as it is and rewrite it as 


its Heisenberg version.) And now we can turn the first two terms into their Heisenberg versions as well: 


ss je te OA 
in Au(t) = (An, Al ees 


This is the Heisenberg equation of motion, and solving this differential equation is often the simplest way we can 


calculate A}! Again, we'll consider some particular cases: 


- If As has no explicit time dependence, then the second term disappears because Bis = 0, and the equation 
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And the Heisenberg operator is simple — if the Schrodinger operator is time-independent (or commutes at different 


times), then Ay = As. But we'll leave it as is so that we have an equation in terms of Ay itself. 


Now suppose we want to compute how the expectation value of a Schrodinger operator depends on time: we 


can calculate 4 i 
in, (w, t] As [yb t) = in (ab, O| Au, 0). 
Now putting the derivative between the bra and ket, we get 
(, 0] iA Ay |p, 0) 
; ae 
and assuming still that A has no time-dependence, this is equal to 


(py, 0| [An, Ay] |, 0) . 


So the time derivative of the expectation value satisfies 


in-<(An(t)) = (An, Au | 


We say that Heisenberg expectation values are the same as Schrodinger expectation values — this can also be 


written as 


in“ (As) = (As, As) | 


which we derived a few classes ago. So this means that the expectation values of Schrodinger operators are the 
same as the expectation values of Heisenberg operators, except that we take the states at t = O in the latter 


CaSe. 


« Now consider the case where As is time-independent and conserved (meaning that it commutes with the 
Schrodinger Hamiltonian). Then [As, Hs] = 0, which also means that [Ay, Hy] = 0, which means that 
dAy 
ans = 0. 


So the Heisenberg operator is also time-independent — if the Schrodinger operator has no t's and is conserved, 


the Heisenberg operator doesn’t actually have any t’s either. 


We'll finish this class with a nice example: 


Example 190 


Consider the harmonic oscillator with the Schrodinger Hamiltonian 


where X and 6 are the usual Schrodinger operators. 


Now the Heisenberg Hamiltonian should be identical, because we have a time-independent Hamiltonian. But we'll 


write it in general first, where we have Ui and U coming in from the left and right: we get that 


We'll check that this is time-independent, but first we need to evaluate the operators X4 and fy. The most straight- 
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iHt/h 


forward way is to plug in the e~ operators and multiply everything through, but this is a little bit complicated: 


we'll use the Heisenberg equation of motion instead! Since X and f are time-independent Schrodinger operators, 


And now this commutator gives us s- Bu[SH, Xp] -2= DB, and plugging this in yields 


dXy sl 


dt me" 


So this looks like an equation in classical mechanics — we have a = ¥, and that’s another good point of Heisenberg 


equations of motion — they look like ordinary dynamical variable equations! Similarly, we find that 


. ap ae. 1 Ra de a ' 
in——* = [6n, An] = 5M [BH Xi = mw? Ry > (—in), 


and thus we have 


aby = —mws 
dt mil 


We can now solve for these in the same way that we classically: taking a second derivative of the first boxed equation, 
we have that 
d?Xy 1 d6y Ly 2. ) 
= = mw XH), 
dt? m dt m ‘a 


so we just get the simple harmonic oscillator equation of motion 


Here, what we should notice about the Heisenberg picture is that we're solving for the Heisenberg operators, which 


tells us about the time evolution of all states at the same time! So we know that we have 
&y = Acos(wt) + Bsin(wt), 


and then similarly the momentum 


dx - ZA 
py = mo =—muwsin(wt)A + mw cos(wt)B. 


But we can figure out what these operators A and B are: at time t = 0, the Heisenberg operators should be identical 


to the Schrodinger operators, so &4(t) = A = & (the Schrodinger operator), and py(t) = mwB = fp. So we know 


sop a 
that B = aa and now we get our equations: 


y(t) = Xcos(wt) + aon sin(wt) |, | By(t) = Bcos(wt) — mwXsin(wt) |. 


So now any expectation of a combination of X and 6 can be found by plugging things in at time t = 0! So now we 


can find the Heisenberg Hamiltonian: 


1 1 1 p = 
a5 5 Mas Xt re (pcos(wt) — mwXsin(wt))* + smu (: cos(wt) + a sin(wt) 
and expanding out yields 


1 1 : 1 : 
mal oe cos* wt p* + aw sin? wt x7 — 5, mm sin wt cos wt( Px + x0))| 
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2 2 
Wwe, 2, L e 1 mw ny WS 
aoe sin? wt p* + mus cos* wt x7 + 5 may Coswitsin wtt(Xp + pX))| . 


And now everything cancels very nicely: the 6% coefficients evaluate to =. (cos? wt + sin? wt) = and the x? 


1 
am’ 

coefficients evaluate to $ Mw? similarly. The cross-terms cancel, and we get a final expression which is identical 

to the Schrodinger Hamiltonian! So what this means is that the when we substitute in the expression for the 

Heisenberg operators into the Heisenberg Hamiltonian, we get an expression which is exactly equal to the Schrodinger 


Hamiltonian. 


20 March 4, 2020 


(Two) practice tests for the exam next week will be posted, so that we can get an idea of what it will be like. Solving 
problems from previous tests is a good way to study; another one is to review past homework and recitation material! 
And it is good to read the lecture notes as well if we've only been using the videos. (By the way, the material for 
today’s lecture on the Heisenberg picture will not be on the exam — we just need to understand concepts up to problem 
set 4.) 


Problem 191 


Suppose A and B are Hermitian operators. How can we check if they are simultaneously diagonalizable? 


We know that Hermitian operators are normal, and all normal operators are diagonalizable. (In fact, they are 
unitarily diagonalizable, so we can get an orthonormal basis.) So we just need to know whether the eigenvectors line 
up. 

Well, the most important thing is to check the commutator [A, B] = AB — BA. If this is equal to zero, It will be 


equal to zero under any linear transformation: 
A= PAP => AB’ = PAPP “BP =P ABP. 


So [A’, B'] = P“1|A, B]P, and thus one commutator is zero if and only if the other is zero. Because diagonal 
matrices commute, this means that if A and B must be diagonalizable in the original basis as well. 
On the other hand, if A and B do not commute, then we cannot end up with a zero commutator. Thus the two 


operators are not simultaneously diagonalizable. 


Remark 192. Using this same idea, note that an operator proportional to the identity looks the same in all bases, 
because 
P“(cl)P = cP7!P=cl. 


So if we're given two operators that commute, how should we proceed? The idea is that we should check the 
eigenvalues of each matrix, and try to work with the one with less degeneracies (ideally none)! (This is because 
having no degeneracy in eigenvalues makes it more clear what the (simultaneous) eigenvectors are.) Remember that if 
an eigenvalue shows up twice for our operator A, there is a whole plane of eigenvectors that would all work — however, 
it takes some work to see which of those actually diagonalize the other operator B. Specifically, if we didn’t pick the 
right ones, we'd have some invariant subspace of dimension 2 for B, which we'd have to diagonalize separately in B 


to give us the final answer. 


Remark 193. Remember that a lot of quantities and properties for our operators are basis-independent: for example, 


the eigenvalues of A or whether A is Hermitian do not depend on our choice of basis. 
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We'll continue on with some practice for a different topic: 


Problem 194 


We have the uncertainty inequality 


‘a 
aane > |(v | 514 8) v) 


(we can put the hats above the operators if we want, but it’s not that important). When is this inequality 


saturated, meaning we have an equality case? 


This was analyzed in the lecture videos: it requires 
(B — (B)) |W) = ry(A- (A)) |Y). 


At the end of the day, we care about the state |W) where the identity is indeed saturated. And we need to also find y 
as well — it’s some real number. But there’s more we don’t know here — the expectation values (A) and (B) are also 
taken with respect to a state W! 


So if we expand out the equation in terms of WV, we have cubic terms: 
BW — (W|B|W)W = iy (AU — (WIA) VW) . 


But we can be a bit clever. If we want to solve the equation AW = (A)W, this looks ugly to expand out. But we can 
instead first find the eigenvalues of A and B, and we can just think of (A) and (B) as numbers a, b. (It’s not true that 


W is an eigenvector of A and B, but this is showing us a general method.) And now our equation just becomes 
(B — b) |W) = in(A— a) |). 


And now we can rearrange some terms: 


(B — yA) |W) = (b— ia) |) |. 


And now we have something that looks like an eigenvector equation! The operator B — iyA is not Hermitian, so 


eigenvalues can be complex (not just real). But now let's add (W| to the left hand side: this now gives us 
(B) — iy(A) = b— Iya. 


And now equating real and imaginary parts (A, B are hermitian), we find that we indeed have b = (B),a = (A). So 
it's okay to replace the expectation values with a, b, because they will turn out to be equal to the expectation values! 


We have likely seen the concept of coherent states in the harmonic oscillator: 
ip 


a|a) = ala), aa Sa 


Here, 4 (and a) are non-Hermitian operators, so when we solve for coherent states, we're solving this kind of boxed 


equation above. 
Remark 195. By the way, if we take the norms of our uncertainty saturation relation, we find that 
AB = |y|AA. 
AB 
A 


This is something we solve for later on — we shouldn't write it in terms of 4% in our equation. 
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In the homework, we do this with the operators A= X,B = p. Then we have 
(B — IyX)W = (Po — 19%), 


so in coordinate space, 
fidv |. ; 
= —— — 19(x) V(x) = (po — 1), 


i dx 
< dv 
[en : 
re yi Ix + Pg — I¥Xo)WV. 
We can now write this in a separable form 
dv ipo 
Th ( A 4 X Xo) | dx, 
and then integrating both sides will give us 
LADO Ni SN 
log V = i a Xo)*- 


And thus W is a Gaussian — these are exactly the states that saturate the uncertainty! They take the form 


W(x) = cl elPox/Me Wx ¥0)°/(2) 


21 Coherent States 


First of all, we'll do a little bit of review. We learned how to calculate the Heisenberg operators, where we subject 
a Schrodinger operator to the transformation U'(t,0)A,U(t,0). The resulting operator Ay has a few important 


‘Ht/h and in general we have 


properties: if the Hamiltonian H is time-independent, we just have Ay = e’4t/M Ale 
the Heisenberg equations of motion to help us solve for Ay. Our main achievement of last time was developing a 
formula for the time-development of X4 and fy for the harmonic oscillator, and we'll see today that these Heisenberg 


operators contain all of the information about the dynamics of this whole system. 


Remark 196. We should read up on creation and annihilation operators: the idea is that 4 and ' are linear combinations 
of & and p, so they also have no time dependence, and in the Heisenberg picture they are also time-independent 


operators: we have y(t) = e7ta and al,(t) = etat. 
The important thing is that we can write X and # as linear combinations of 4 and 4 as well: 


dL _ 4). 


hi 
R= (o44a)),. psi 5 


2mw 


And indeed, if we take the Heisenberg version of this operator (making every X into a X4 and so on), the equation still 


holds, and then we can substitute in our new creation and annihilation operators and we'll recover the familiar 


y(t) = Xcoswt + =. sinwt,  Py(t) = Ppcoswt — mwXsinwt. 


Today, we'll use these concepts to understand the coherent states of the harmonic oscillator. The motivation here 
is that in any energy eigenstate of the harmonic oscillator, operators have constant expectation values! So if we ask 
about the position or momentum or other property of our particle, it will look the same at all times. This is still an 
interesting state, but we want to construct quantum mechanical states that behave somewhat classically. And 


this will have applications to light and photons soon! 
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The first step we'll take is to understand translation operators. We'll start with the unitary translation operator 
Te e iPxo/h . 


we've seen these operators a lot in the homework. This is unitary, because xo is a real number, 6 is a Hermitian 
operator, so the exponent —/fxo/f is anti-Hermitian. And then the exponential of any anti-Hermitian operator is 
unitary. 


The reason these operators are particularly nice is that the multiplication of two such operators 
ig e7!Px0/F a—iPyo/h 
0 0 ’ 
and now the two operators in the exponents are just multiples of each other, so they commute — thus this Is just 


— p—iP(xo+yo)/h _ 
=e _ T echype 


So we don't need Campbell—Baker—Hausdorff here! We can also get a simple expression for the inverse: plugging in 
Yo = —Xo), we find that 
Tyo —xy = To =!, 


so the operators 7,, and T_,, are inverses. But to get more intuition, we need to do a bit more computation by having 
our operator act on X and 6 — that is, we want to compute the two quantities 


Tle: Te Blse: 


xO 


We've already shown in our own work that these are actually & + x9/ and 6, respectively (the second expression is 
simple because everything commutes). So if we have a state 7, we can ask for the expectation value (X)y, and if this 
iS a particle that is somewhat localized, the expectation will be roughly where the particle is. But then we can also 


compute () Tow — that is, what T,, is really doing — and this expectation is 


(P| TLToo [Y) = (| (% + x0) LW) = (Ry +0. 


In other words, the expectation value in the new state 7,,W is the expectation value in w, except we've translated 
everything by a displacement of xo. (And that explains the name “translation operator!) We should re-verify that 


we have 
Tx x) = |x + xo) ’ Tx \q) = w(x _ Xo), 


because w(x — x9) is the wave function translated to the right by xo units (by function transformation rules). 
So now we can use this translation operator to get our coherent states. We'll start by taking the ground state of 


the harmonic oscillator, and we'll displace it by some x9: the resulting state looks like 
|%) = Tro 10) = e7?°/" |0) . 


Intuitively, we should imagine the ground state wave function in the harmonic oscillator potential, except we translate 
it so that its center is at some position xo instead of 0. We have no time dependence so far — we'll first understand a 


few more properties of this state, and then we'll time-evolve it. 


Example 197 


What is the value of (x9|Xo)? 
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We should be careful to note that this is not a position eigenstate, so we can't use the formula (x|y) = 6(x—y). 


But we do know that X is the result of a unitary operator acting on |0), which preserves length. Thus, 
(Xo|Xo) = (0/0) = 1. 


The w associated to this state is then w(x — xo), where (x|0) = Wo(x) is the ground state wave function. 
We can now proceed with a few other calculations: if we want to compute the expectation value of any operator 


A on our coherent state, we want to calculate it in a vacuum, so 
(X| A |X) = (0| TAT, 10) . 


Basically, we're tracing the problem back to what the regular ground state (vacuum) is doing. For example, if A = &, 
we have that 
(%| XX) = (0| TERT 10) = (0| (R + x0) 10) = x0, 


as we expect, and similarly 
(%| B |X) = (0] TL PToo 10) = (0| (6) 0) = 0. 


Putting these together, we can find the expectation value of the Hamiltonian: because the 6 is unchanged while the 


0). 


To avoid computing too hard, we can take the original Hamiltonian and separate it from the other terms here: this 


0) 


The first term here is a constant times an expectation of X, so it is just zero, and then the last term is just some 


& becomes (X + xo), we have 
Rag Tas 2 
oma (% + Xo) 


ig (0 


evaluates to ; ; 
= (0| H|0) + (0 5m (250) + 5 Mus xg 


constant. Putting everything together and using that the expectation of the Hamiltonian in the vacuum (ground state) 


fw 


iS >, We have that 


1 1 
(%| H |%) = a hw + 5 Mus xG 


But this is now looking very classical: the expectation value of the energy is a small quantum term, plus the cost of 
stretching everything out to xo, which is 5kxG. In other words, for large enough xo, we can think of the second term 


as being the “cost” of having the particle being off to the side in a potential! 


Remark 198. As small exercises, it’s worth calculating that 
fi mfiw 
2 | AD) elon BOly 
Xo) =xXp +a=—, (X Xo) = ——, (Xo| XP + PX |xo) = 0. 
Mo) =O +5, (IPL) =, (186 + BRI%) 

The idea we're approaching now is that of time-evolution: it’s going to turn out that even though this coherent 
state is not an energy eigenstate, the wavefunction will not actually change shape — it'll just move back and forth! 
This is surprising — usually superimposing only two energy eigenstates will change the shape, but this is an exceptional 
case. Let's use the notation that a state |X9) looks like |xo, t) at some time t: to explore what this looks like, we'll 


take some expectation values with the Heisenberg operator: 
(A)t = (Xo, t] A|Xo, t) = (X0| Ax |X0) . 
and if we wanted, we could also write this as (0| TL AtT oo |0), so everything can be computed from expectation values 


on the vacuum. 
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Example 199 


What is the expectation value of X as a function of time on this x9 coherent state? 


We plug in the Heisenberg operator X, to find that 


Balt) = (% 


(: coswt + a sin ut) 
mu 


: 


(remember that the key idea is to evolve the operators, not the states), and now we know the expectation values of 


X and p in the coherent state: they're xo and O respectively, so this just evaluates to 


(X)%(t) = Xo cos wt |. 


In other words, this object oscillates classically — we again have classical behavior of a quantum state! 


Example 200 


What is the expectation value of 6 as a function of time? 


This expectation should not just be zero for all time, since an object that is oscillating must move and therefore 


must have some momentum. We plug in the Heisenberg operator Hy to find that 


(B) z(t) = (Xo|(P coswt — mwX sinwt)|xXo) =| —mwxo sin wt |. 
Indeed, we now find that : 
(P)@(t) = m= (x)e(t), 


so we get classical behavior in the momentum as well. 

But now here's the key calculation: we want to show that we have coherent evolution. In the harmonic oscillator 
ground state, we have a minimum uncertainty packet — the ground state has some Ax and Ap, where their product 
saturates the uncertainty principle. But to make sure we have coherency, we Just need to make sure that the uncer- 
tainties remain the same and that they're saturated. That would imply that the shape is always Gaussian, so we do 


have the same shape moving around in this classical manner. 


Example 201 


What are the uncertainties Ax and Ap as a function of time? 


This is an example now where the calculation becomes a nightmare if we don’t have the Heisenberg picture. We 


know that (using the ususla formula for uncertainty) 
(Ax)?(t) = (%, t]X7|%, t) — 0%, t181%, t)?, 
and we've calculated the second term already: we have 
(Ax)?(t) = (%|xA(t)|%) — xg cos? wt. 
And now we focus on the first term: expanding out the expression for x4, we have 
a2 
92 


2 PY 2 L SA 4 AR 
x eos wt —_—s sit wi coswt sinwt(Xp + px) 
mw mu 


(%|xA(t)|%) = (x 


a) 
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and now we can calculate each of these terms by referring to the “small exercises’ above: all of this simplifies to 


= (x? + a cos? wt + one sin? wt +0 
~ (0 omuw 2m2w2 


so plugging this into our uncertainty, the x5 terms cancel and we just have 


fi 
2mw 


h 
2 “12 

t t)= ‘ 
(cos* wt + sin® wt) a 


(Ax)?(t) = 


So the time dependence disappears — the uncertainty Ax remains the same throughout the process! As an exercise, 


we can verify that we indeed have 
mfw 


(Ap)*(t) = 
and now ApAx = f is a saturation of the uncertainty principle, so our state maintains its shape through time- 
evolution. 
So we'll now turn our attention to looking at this coherent state in the energy basis. Somehow, we've created a 
superposition of different energy states that move nicely together — if we can understand where this comes from, we'll 
be able to generalize our coherent states completely. 


If we write 


there is a famous length scale in the harmonic oscillator 


h 


Ores ah’, 
ae ne 

This is basically the uncertainty of the position in the ground state up to a factor of /2, and this is the only way we 
can really construct a length by dimensional analysis. So let’s plug in the expression for 6 into our coherent state: we 


have that 


|Xo) = exp 


X0 at x 
a a Q). 
(Fag 9) 00 
So now this is nicer because a has no units, and the operators 2 and 4! have no units either. (And remember that 
because 4! — 4 is anti-Hermitian, so it makes sense that we have no j in the exponent.) We're going to reorder this 


exponential, which is a job for our Baker—Campbell—Hausdorff formula: 


1 
SOY a eg 


as long as [X, Y] commutes with both X and Y. The idea here is that we want to split up the creation and annihiliation 


operators — we want them in separate exponentials — because we don't want to have to expand powers of 4? — 4. But 


mae Se s,s 
if we use the formula, we have that (letting X = a and Y = Tod a) 
9 gt__0_ 4 


e V2d9 4 V2d9 a 


(note that we choose things in this order so that the 4 annihiliators act on |0) first, because instead of creating 


states, we can kill the vacuum!) can be rewritten as 


xO at 


0 0 
ev2g* @ Vad? @~ 31K.) 


2 
x6 


=>, where we've used 
2d2 


and [X, Y] is now a number (which is good, because it now commutes with X and Y) equal to 


111 


that [a', 4] = —1. This gives us a final coherent state of the form 


io 
IX) = eV?" @ V26"e 4% 10). 


Sh 


a1 
Now the e *% is just some number, and then the exponential of the annihilator operator is 1 plus some annihiliator 


ze 


terms — everything except the 1 kills the vacuum! So the annihilator exponential acts as the identity on |0), and 


now we can write that 


and now we plug in our nth energy eigenstates: 


(ary? 
vn! 


so we now have an expression for our coherent state in terms of the energy eigenstates: 


|n) = |0) , 


If we think of this as a sum ey Cn |n), we have a precise combination of energy eigenstates, and we can calculate 


|*, which tells us the probability to find the coherent state in the nth eigenstate. We find that 


ipr=e0 toe \ i fe 
: 2a) nt \2a2) ° 


And now we have the same expression inside the exponential and the power: if we define A = 


ICn 


,, we now have that 


This is called the Poisson distribution! In other words, the energy is (in some sense) Poisson distributed in a coherent 


state. 


Remark 202. Poisson distributions come up when we, for example, have a radioactive material of a certain lifetime. 


Then the number of events that happen in a week is Poisson distributed. 


We'll first check that we do have a probability distribution: 
oo WU 

ea 
Ylae=er DoT 


and now the infinite sum is the power series for e*, so this does evaluate to 1. One relevant property of any probability 


distribution is its expectation value — this isn’t necessarily the most probable n, but we can still find It: 


co 
s nicl? - ; ne 
n=0 n=0 , 
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and we can get this by applying a A-derivative: 


Example 203 


Remembering that A = a this tells us that if we have xo = 1000do (that is, we’ve moved the particle 1000 
0 


times the quantum uncertainty), the most strongly occupied levels are on the order of 1 million. 


And we can think of our occupation number by using the expectation value of the number operator: 


(Xo| WX) = a Gr (m| Nn) Cas 


m,n 


where we've substituted in the values of |x9) and its corresponding bra. Since |n) is an eigenvector of NW with eigenvalue 


= s GaGanonn = S e, 
nm n 


which is again the » we were just talking about. From here, we can do some more calculations: we've found the 


n, and cy = Cp, this all just reduces to 


expectation value of the energy, and it’s worth also thinking about the uncertainty as well — is the set of energies 


sharply peaked or more spread out? This is left as an exercise for us — it turns out that 


nA) AE na) 
AE)x, = hw ee 
( )% 2d — fiw d 


So the energy uncertainty for a classical-looking coherent state — that is, where x9 >> d — has AE large compared to 


the spacing of the harmonic oscillator. So lots of different energy levels will be excited, but we also know that 
(E)  3muw?xe Xp 
AE” ftw ~~ /2d 


So this state has an interesting property: the energy uncertainty corresponds to many different levels of energy 
eigenstates, but this uncertainty is still much smaller (in fact by the same factor) compared to the actual average 
energy. In other words, we have a state with an almost definite energy, containing many levels in the oscillator. 


And now we're ready to generalize our coherent states in a way that makes them more flexible: 


Definition 204 


The a coherent state for a € C is defined to be 


The operator D(a) is unitary: its exponent is 


which is equal to its own dagger, so the exponent is anti-Hermitian and therefore the exponential is unitary. And we 
can check that when q@ is real, this will reduce to the previous case. 
First of all, let’s calculate 


ala) = dexp (ad! — a*@) |0). 
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We know that 4 kills the vacuum, so we would want to switch around the two terms: we can replace the product with 
a commutator 


= [4, exp (ad! — a*&] |0)] . 
This is again in the formula sheet Campbell-—Baker—Hausdroff says that 
[A, e®] = [A, Ble® 


as long as [A, B] commutes with B. So taking A = 4 and B = aat — a4, their commutator is just a number, and 
we find that 


4|a) = ala) |. 


So we've now diagonalized a non-Hermitian operator — we found its eigenvalues! Unfortunately, we don't get any 
of the nice theorems about Hermitian operators — states of different eigenvalues aren't orthogonal, and we don't have 
completeness, so nothing works quite as nicely as we want. But this is still pretty remarkable — coherent states are 
eigenstates of the annihiliator operator. 

The physical interpretation of such a state is that |~) is a coherent state with some initial momentum (in the 
real case where our position is x9 coswt, the particle starts off with zero momentum). Indeed, we can check that 


(a|X|a) = Ht (ala+ a']a). 


V2 


We can now apply 4 on the |@) ket and 4! on the (a| bra, and we find that this is equal to 
a (a +.a*) = dV2Re(a) 
= 5 = . 


Similarly, we can calculate the expectation value of the momentum: we find that 


, v2n 
(a|B|ox) = —— Im(a). 
The formulas are a bit messy, but the main point is that the real part of a corresponds to the position of the coherent 
state, while the imaginary part of a tells us the initial momentum. And we can describe this geometrically by 


considering the complex a-plane: this a vector will then evolve in time in a nice way, because 


lox, t) _— eT iHt/h gaal—ata ,iHt/h —iHt/h \0) 
(where we've added the blue terms to make computation nicer), and now the last two terms evaluate to e~/”*/? |0) 
(we have an energy eigenstate for H of the ground state energy fw) while the first three terms are basically the 
Heisenberg operator, except that we have opposite signs for the t. Thus, we have the Heisenberg operator at time 
—t: 

la, t) = axa} (—t)-a* 4y(—t) g—iwt/2 |0) 


But we have the formula for the Heisenberg operators ay and al plugging those in (which just gives us an additional 


phase) yields 


-iwt sto elwt gs 
lo, t) ewe hala el" a iwt/2 0) 
So a has now become aet! In other words, 


la, t) = a7 iwt/2 |e?) ; 


The e~/“*/2 jn the front is an irrelevant phase for the whole state: all that’s happening is that @ is rotating in a circle 
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with frequency w in our complex plane, and at any time, the real part is proportional to the expectation value of the 


position, while the imaginary part is proportional to the expectation value of the momentum. 


Fact 205 
(x) 


We do need to choose appropriate units on our real and imaginary axes: the length scale for the real axis is vee 
i0) 


(Pp) do 


an This way, we do indeed rotate in a circle, rather than just an 


and the length scale for the imaginary axis is 


ellipse. 


So in summary of everything we've been discussing: coherent states came out of taking the ground states of our 
harmonic oscillator and displacing them with some translation operator. But ultimately, the reason this all works out 
is that this operator is actually that we have an exponential of something depending on our creation and annihiliation 
operators, which allowed us to define a more general a coherent state. Because our operator D(a) is unitary, our 
state |a@) is indeed well-normalized, and in fact this state remains a coherent state with a value of @ rotating with 
some angular velocity w in the complex plane. 

We'll finish by developing one more idea here. a coherent states are not position, momentum, or energy eigenstates, 
so there are measures of uncertainty in each of these observables. So we should really draw a as a kind of Gaussian 
blob: while we know the exact value of the expectation value of & and 6, that’s not necessarily going to be the exact 
value that we measure. 

And we'll relate that to the concept of an electromagnetic wave. Suppose this EM wave has energy E, and its 
electric field is described by a function Acoswt. Earlier, we discussed briefly the idea of energy-time uncertainty — 
let's do a handwavy argument first. The phase of this wave ¢ = wt has some error 


Ad _ 
<= 


At, 


and now this wave has an energy 


E = Nfw AE = ANfw, 


where NV is the number of photons. Then 


h A h 
AEAt~ 5 => anny 2 ~ 2 => | ANA¢~ 1}. 


This last relation is actually taken somewhat seriously: comparing the uncertainty in the number of photons and the 
phase in a wave in quantum optics does yield a result that looks like this. Of course, our derivation is bad — we haven't 
really explained what At means, but this gives us a bit of intuition. 


So let's do a more explicit calculation: in our coherent state, we know that 
(Na = (alatala) , 
and now we can have 4! act on the left and 4 act on the right to get 
(cov cxlax) = Jor]? (axle) = Jox|?. 


So in a harmonic oscillator, the expectation of the number operator is the squared length of a. Similarly, we can find 
that 


(Me = (aa! aat ala) = ||? (a aat a) . 


Now 4 and 4 are kind of in the wrong order — we want 4 to act ona ket, but it’s acting on a bra — so we replace this 
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object by the commutator plus its reverse order: this yields 


= |a|?(1 + (ala! ajar) =] Jal?(1 + Jal?) |. 


And this allows us to calculate the uncertainty in N: 


AN = Vlal* + lal? — Jal* = Jae. 


In other words, the uncertainty in N is the square root of the expected value of N — thus the “length” of @ in our 
complex plane is actually AN, not (N). 
And now we can be more precise with our uncertainty relation. a is rotating in our complex plane, and the 


uncertainty of the position, and also the momentum, in a coherent state are just the uncertainty in the ground state: 


Remembering how we chose the units on our real and imaginary axis, we find that our Gaussian “blob” of a spans an 
uncertainty on the order of s in the real axis, as well as in the imaginary axis. So we can say that the diameter of the 
Gaussian blob is on the order of 1, so the phase of a in the complex plane has some amount of uncertainty as 
well! Since the length of a is AN, and the blob covers a length of 1 along the circumference of the circle, this tells 


us that the uncertainty in the angle is NTE So this at least gives us a picture of where the equation 
ANA¢~ 1, 


the phase uncertainty relation, originates from. 


22 March 9, 2020 


We've now started talking about Heisenberg operators and coherent states — questions on this material were postponed 
last lecture, but we can talk about them now. And we'll do some practice problems to help prepare for the exam. 
(There will be a formula sheet, but the test is closed book, closed notes. If we have a request for a particular equation, 
we should let the 8.051 instructors know.) 

As a general rule, we should not rely too much on trying to memorize equations and principles, but we should know 
by heart things like the harmonic oscillator Hamiltonian or Schrodinger equation because we've worked with them 


repeatedly. For example, if we know that we can write 
p? a ee rt 
He £ + smuts? = tw (#1 +5). 


where the number operator M/ = 414, we can know that & and f are some constants times 4+ a! and 4— at. We know 


that = is a helpful length scale, because 6 has units of ae which means H has units of a = mw?L? (by comparing 


it to the kinetic energy term). And this tells us that 
2 
hae pat 
mu) mu’ 


which tells us that the & operator should have a J term, plus some additional constants. And if we're not sure, 


we can always check them by using [4, 4'] = 1. And this means that we can find out what the explicit formulas for 4 


and 4! look like by taking linear combinations of X& and . 
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So in this example, knowing that we should write the Hamiltonian as hw (N + 5) is important on an intuitive level 
to help us find the direction of how to proceed. But then the rest can be deduced step by step! 
Similarly, if we've been practicing, we should know the Pauli matrices pretty well (maybe knowing oz and ox). But 


also important is knowing their properties: they're Hermitian, traceless, and so on. 


Example 206 


Consider an infinite square well with 


L 
=— SNS 


V(x) = 
oo otherwise. 


Show that Ax < 5. Can this be saturated? 


Note that 


(Ax)? | = (x*) — (x)? < (x*), 


2 
and since |x| < 5 everywhere, (x*) < (5) . Taking square roots on both sides yields the desired result. But 


equality would occur if the particle is only found at —§ or 5 (with equal probability of each so that (x) = 0), and this 
would give us a discontinuous wavefunction. So it is not possible to saturate the inequality. 
By the way, a state with zero position uncertainty is a delta function, which is not normalizable (because we have 


to square it)! So that’s a kind of degenerate state on the other end. 


Example 207 


Consider position eigenstates in the simple harmonic oscillator: we wish to construct a state 
Soke) sels). 


As a hint, * = \/ 525 (4+ 4*), and we should use the ansatz 


eneoee (i - sata) 0) 


where G, ¥ are constants to be determined with the eigenstate condition and N(x) is determined with the overlap 
(0| |x). 


First, we use the eigenstate condition: this tells us (plugging in the definition of X) that 


&|x)4/ (at at) N(x) exp (a3 = 5133) |0) = xN(x) exp (03 — 51a) \O). 


We wish to move that (4) past the exponential term, but this makes us pick up a commutator term for the 4: since 
[A, e®] = [A, B]e® if [A, B] is a constant (more generally, when [[A, B], B] = 0), we start this problem by doing the 


side calculation 


ges —aratat _ Bat—pratat 5 1 [ betaat — sata ebetaat—sratat 
5 : 
The whole point is that plugging everything back in, we require 


B-yalt+al, 
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h 


omy =X We can see the rest of the solutions online. 


so y=1 and 6G 


Here’s one more problem we can think about: 


Example 208 
Consider the uncertainty relation 
NST NS. 


Find C, and rewrite it as Ao,Aoy > C’. Find the states that saturate this inequality (with |y| < 1) 


(A— (A)) lw) = —i9(B — (B)) |p). 


23 Photon States and Two State Systems, Part 1 


Today, we're going to talk about a new kind of system — quantum states of the electromagnetic field. A photon 
is a discrete quantum of that field, so in some sense, this is a first introduction to quantum field theory! It turns out 
that the harmonic oscillator plays a pretty important role here, and a key idea will be the set of coherent states that 
we defined last time: 

la) = D(a) 0), D(a) = "8, 


(These states have the property that 4|~) = a|a).) So photon states have to do with the electric and magnetic field, 
and we're going to try to do a quantum description of this by starting with a description of its energy. Recall that the 


energy E can be evaluated by an integral of the form 
1 = = 
E= 5 | exe E07, te Br, £)I 


We'll focus on a particular mode of this that we've seen from 8.02, where we have a finite volume cavity and a single 


plane wave with some wavelength and some frequency. Suppose this wave is along the z-direction — then we can write 


E,(z,t) = 4] apuat) sin(kz), 


where V is the volume of the cavity (we can think of it as a large box, or we can imagine it being almost infinite), and 


our field as 


w and k = © are the frequency and wavenumber of our electromagnetic wave, respectively. (The factor in the front is 
just for normalization.) Here, g(t) is some arbitrary function of time that we'll determine later, and the w will make 


more sense soon. By Maxwell's equations, this corresponds to a magnetic component in the y direction 


cBy(z, t) = aves) cos kz, 


where p(t) is some other arbitrary function of time (related to q(t) by Maxwell's equations). We can check this 
configuration more carefully, but for now the important thing is for us to think about the energy of such a system: 
since we're squaring the E and B fields and integrating over the whole box of volume V, the prefactors will actually 
disappear — this is because the average value of sin? kz and cos? kz is s, and this is a valid approximation for us to 


use If our box is large enough. Putting everything together, we'll find that our energy is 


E = = (p(t)? +wq(t)?) . 


Nie 
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So now our w makes a bit more sense — we're getting something that looks a lot like a harmonic oscillator, except 
we're missing the mass terms: this means that we have different units for p and q as we do for a usual harmonic 


oscillator. 


Fact 209 


Notice, though, that we couldn't have done any better — photons have no mass, and we're trying to describe an 


electromagnetic field with photons. 


So we'll just resolve the units ourselves — p must have units [p] = VE, and q must have units [q] = TWE (where 
the extra time factor T comes from w having units of +). So at the end of the day, pq has the units of [T][E], 
which are the units of f — that’s a good sign! This perhaps motivates us, because there is a natural correspondence 
between a mode of vibration of a classical electromagnetic field and an energy functional that looks like the harmonic 


oscillator. 


Proposition 210 
We'll say that 


E = 5 (ot)? +u?a(t)?) 


is a Hamiltonian, where p and q are the Heisenberg operators of the electromagnetic field. 


So now we can say that 
H= (6° + w*@?) 
is our time-independent Hamiltonian — even though p and q are functions of t, that's because we're taking those to 
be the Heisenberg versions of our operator — and while this might seem speculative, we can do some checks to make 
sure that this is indeed reasonable. First of all, we should look at the Heisenberg equations of motion and compare 
them to the classical equations of motion: 
in py (t) = [6n, H] 
for a time-independent Hamiltonian, and we can also plug in Maxwell's equations and see what relations we get for 
q(t) and p(t). It turns out (when we do this for homework), those two sets of equations are exactly equivalent! So 
it is indeed valid to think of our dynamical system with g and 6 as our quantum operators. 
So remembering that we're using a harmonic oscillator, except replacing mass m go to 1 (this is okay because 
we found that p and q have units that multiply to A), we now have expressions for our two operators in terms of 


creation and annihilation operators: 


fi 1 /wh 
$a) ey ee ey Oa 
g= 5-048), B= 2-4) 


just by setting m= 1. Indeed, the units match up over here, and then we can also write the Hamiltonian in terms of 


1 
H=h Ar) a 
u (3 345). 


because m doesn’t show up in this formula in the harmonic oscillator case anyway, and therefore we can also write this 


~ 1 


We'll now physically interpret this as saying that a state with some number of photons rn has the energy nfw (plus 


of our number operator: 


in terms of the number operator 


the extra $fiw). This is actually a huge assumption, because what we're doing Is taking X and 6 and replacing them 
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with @ and 6, but they have nothing to do with the usual position and momentum — they should represent the electric 
and magnetic field. So really, E and B are becoming quantum operators! Quantum field theory is the whole idea that 
fields are operators, and it seems to be a valid idea in this case: a state with N photons is viewed as a state of a 
“harmonic oscillator” with mass 1. 

So now let's go to our Heisenberg form for our operators: we have equations above that tell us 6 and g in terms 


of 4 and 4', and just replacing everything with its Heisenberg counterpart now tells us that 


hi 
q(t) =4/ on (eh + el@tat). 


(Each operator on the right hand side just gains a phase.) And now substituting this back into our electric field, we 
find that 


E,(z, t) = &o (e“*4 + elt at) sin kz |, 


where t is the time in a Heisenberg operator and & = ne And the main point here is that this is now an 


electromagnetic field operator! 


Example 211 


Let’s find the expectation value of this electric field operator to get a bit more intuition. 


Suppose we have some photon energy eigenstate |), so now we have a state with n photons and a total enregy 


of nhiw + $ hw. We now want to compute 
(Ex)|n) = €o (e7* (njajn) + e* (n|a"|n)) sin kz, 


and the idea is that this tells us how the electric field should look in this state |n). But 4 reduces |n) to |n — 1), which 
is orthogonal to |n), and similarly 4" raises |n) to an orthogonal state as well. So the expectation value is zero, but 
this isn’t that surprising — in an energy eigenstate, the wavefunction doesn't change in time, so nothing interesting 
happens. 

So let’s pick a more imaginative state — we've said many times that coherent states act like classical states, so 


let's try putting in the state |a) instead. Then 
(Ex)ja) = Eo (e-* (alalax) + (a|a"|c)) sin kz, 


and this time things are better: (a|4|a) = a (a|a) = a, and similarly the other term evaluates to a* (we evaluate on 


the bra instead), and this all simplifies to 
= Ey (eta + ela") sin kz, 


and now this is great: the expectation value of the electromagnetic wave look like traveling or stationary waves that 
we see in 8.02! So a classical wave that resonates in a finite-volume cavity really is just a coherent state of the 
electromagnetic field: even though this state of photons is not an eigenstate for energy, position, or momentum, we 
still have a nice classical picture that looks like a normal wave. (And that also explains that lasers are coherent states 
of the electromagnetic field — if the number uncertainty is large, the phase uncertainty can be very small.) 


And now we can do this more explicitly: this can also be written as 


(E)\a) = 2€ Re(ae~*) sin kz, 
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and now if we write a = |ale’®, this simplifies to 
= 2E€p|a| cos(wt — 8) sin kz, 


which is a standing wave with a fixed spatial distribution, and it has a nice classical description as well as a good 


quantum description. And if we want to find the energy of this state, it’s the expectation value of the Hamiltonian: 


(H) = hw («im + 5) ' 


and in a coherent state, we know that N = |a|?. So the coherent state a has ||? photons. 

This is basically all we will talk about with photon states — we could put together different superpositions of modes 
and discuss commutation relations of the field operators and so on, but that’s what quantum field theory is for. In 
summary, the main point of this discussion is that the harmonic oscillator has entered in an interesting way, such 
that we have an uncertainty between the E and B fields. The different energy levels of this oscillator correspond to 
different numbers of photons, and we get a classical description by considering coherent states — this is how we can 
recover the classical wave oscillations that are familiar from 8.02. 

We'll now spend the next few lectures on two-state systems, and the first topic of interest is that of spin 
precession. It seems like this is a very particular kind of problem when we have spins in magnetic fields, but it'll turn 
out that any two-state system can be thought of as a spin in a magnetic field, even if we're talking about an 
electron shared between two atoms or an ammonia molecule — mathematically, spins are what we've already become 
familiar with. 

Recall that this whole concept of spin precession comes up when we try to relate a particle’s magnetic moment 


with its angular momentum. We made an argument earlier on in class that 


= de 
=> = 5. 
ay 2m 


where S is the classical angular momentum. But we also claim that this is true in quantum mechanics as well, except 
for a few small modifications: we get a slightly different magnetic moment in the Hamiltonian, which gives us an 
additional g factor on the right side, and the S is now an intrinsic spin angular momentum. It's a bit abstract, and 
the best way for us to view this object is as an operator! The magnetic dipole moment is then also an operator (since 
it's a constant times the spin operator). 

Recall that g = 2 for the electron — this is predicted both by Dirac’s relativistic equation for the electron and by 
experimental results — and particles like the proton or neutron have different values of g. For example, a neutron has 
three quarks — two with some charge, one with the opposite — and it’s possible that this can have a positive angular 


magnetic moment. At the end of the day, we'll simplify this with some notation and just write 


p= 75 | 


where the constant -y summarizes all of the factors that we gain throughout this process. The Hamiltonian for such a 
system is just (subscript s for spin) 


H, = —E- B, 


and here B will typically be a static magnetic field, so that we don't have to quantize it and think of it as a quantum 


field (as discussed above). And we'll typically write this also as 
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Example 212 


When the magnetic field only points in the z-direction (and is of the form BZ), our Hamiltonian simplifies to 


A; = —yBS,. 


Then the unitary operator that generates time evolution of states (for a time-independent Hamiltonian) is 


iH.t —i(—yBt)S, 
exp { — h = exp — =e 


Now we're going to use a property that we've justified in homework but will understand in more detail in the next few 


Rela) =e (3), 


where ff is a unit vector and $; = #- S. This was called the rotation operator — we verified with some calculations 


lectures: we talked about the operator 


that this rotates a spin state by an angle a around the axis #. But any spin state also corresponds to a vector 7’ — 
we're going to verify that this vector f7’ is indeed rotated by an angle a. 
So if we now look at the operators H; and Rz, notice that the Hamiltonian just has — Bt playing the role a and 


S, playing the role of SH. So we must do some kind of rotation here as well, and that’s the calculation we'll do now. 


Example 213 


Our magnetic field is still in the z-direction. Consider some arbitrary spin state in the direction in some direction 


A with spherical coordinates 09, 69 (note that this is not the same as the vector 7 around which we're rotating 


the states). 


Our spin state is thus in the direction 7 at time t = 0, and its general formula is 


Bes 
+) +sin 2! |). 


0 
|, 0) = cos a 


0 
2 
Now we'll apply the time-evolution operator to this state, but first we'll do a preliminary calculation: 


hi 
He |+) = —yBSz +) = 785 |+), 


and similarly 


h 
H.|-) = -YBSz|-) = +785 |-). 
So now the state at any time is governed by our unitary time evolution operator: 
Ip, t) = eM eh, 0), 


and now we can write |w,0) out as a linear combination: because the exponent H acts on |+) with some eigenvalue, 


we can put that eigenvalue into the exponent instead! And thus 


0 ‘ Qo , 
iw, t) = cos so ee eS: 4+ sin oo eo i(+yBh/2)t/h l= 


where we've just replaced H with its eigenvalues, and this simplifies to 


A 8 ; ; 
|W, t) = cos ae |-+) + sin oe |-). 
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We have some extra phase terms, so we need to factor those out to get them into our generic phase state: factoring 


iyBt/2 


out e yields an irrelevant phase, and this is just 


| 9 _ 
ef Bt /2 (cos = +) + sin hl al -)) ; 


So now the exponential term is exp (i/(¢9 — yBt)), and now we know exactly what’s going on here: we have a spin 


state where | 6 = 6 | is fixed, while | 6 = ¢p — yBt | is precessing at some linear rate. Indeed, this is what we claimed 


with the rotation operator Ry earlier on in the class! And the negative sign means that @ decreases in time. 

And now that we've done a calculation, we'll also present the general result so that this all becomes more clear. 
Spin precession is both a quantum and a classical phenomenon — in the classical case, if we have a magnetic moment 
ii in a magnetic field B, we have a torque 

F=2xB 
(this is the computation where we have a square wire not aligned with a magnetic field from 8.02). But the rate of 


change of angular momentum is this torque, so 


—|=7=fx B=7S x B=|-(yB) x S| 


f 


This equation is a particular case of a famous equation in classical mechanics where we have a rotating vector 


== WxxX: 

dt 
the solution is that a vector xX rotates with angular frequency w around the axis defined by w. So in the specific 
case we're talking about, S plays the role of x, and 7B plays the role of Ww. This gives us what’s called the Larmor 
frequency 


Gi = —YB. 


Indeed, this is the same Larmor frequency that we derived in the phase @ in the quantum state — we now have 
derivations in both cases! And this isn’t a coincidence — we just made our classical variables into quantum operators, 
and none of the physics is changing. u- B, the energy, just became the Hamiltonian, and now we can rewrite our 


Hamiltonian in terms of the Larmor frequency: 


— 


As =-@-B=-yB-S=6,-S. 


What's important to keep in mind here is the main form of this equation: if a Hamiltonian is some vector dotted with 
S, that vector will be the Larmor frequency of rotation, and the spin states will rotate with a frequency of wr. 
We'll finish by generalizing one level further so that we can understand why any system can be thought of as a 


spin system — this is maybe the best way to understand the physical effects of any Hamiltonian. 


Example 214 


Consider a time-independent Hamiltonian for a two-state system 


H= 


9o+ 93 G1 — 192 
gnt+ige go-93] 


where go, 91, 92,93 € R. 


Remember that “two-state system” means we have “two basis states,” so a Hamiltonian is a Hermitian 2 x 2 matrix 
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— indeed, the above matrix is the most general form allowed. But we've arranged our coefficients in such a way that 
we can rewrite 


H = gol + 9101 + g202 + 9303, 


because the Pauli matrices, along with the identity matrix, form a basis for all Hermitian 2 x 2 matrices. And now we 
can write this Hamiltonian as 
H=gq/+g9-¢, 


where g is (91, 92, 93). And now if we write the g vector as 

g=gn 

for some unit vector, we're saying that a general Hamiltonian can be understood as 
H=gol+gn-a. 


And we know how to work with this — we can diagonalize the matrix and find its eigenvalues and eigenvectors, but 


we've already done that work earlier on in the class! The eigenstates for such a Hamiltonian are 


5 
Al 
5 
| 
T 
WS 
II 


Sef, 


and now we can replace & with our spin operator S (just picking up a factor of ) to get that 


n-S|n;+) 


hi 
5 |n; +). 


These vectors |n;+) are exactly the basis in which H is diagonalized, so we've found our eigenstates! And the values 


of our energy are just 


H|n,+) = (gol + git &)|n, +) = (go + g) |, +) - 


So we've figured out both the energies and the eigenstates for the most general two-state system: we have a state 


|n; +) | with energy | go + 9 |, as well as a state | |n; —) | with energy | gg — g|. We don’t have to do any diagonalization 


by hand, as long as we know the values of go, 91, 92, 93. 

Here, |n; +) is the excited state and |n;—) is the ground state: there is a splitting of 2g, so the energy gap 
between our two eigenstates actually corresponds to the twice the length of the vector (91, go, g3). And there's just one 
more thing we want to do with this: time-evolving the system. But we know that the second term of H = go/+gf-o 


can be identified with the dW, - S expression we had before: we can write the second term as 


29 = 
n-o = —S, 
gn- a ‘ 
so our Larmor frequency in this case is 
~~ 1 og 
Gy. = — |. 
ae, 


So the non-identity part of the Hamiltonian does precession, and the identity part only produces a pure phase — it 
doesn’t change the direction of our state! That pure phase will give us an extra factor of e~/%*/" through all of our 
states, but this go/ term is generally almost never important. 

In summary, we've taken our general Hamiltonian and identified it with the physical phenomenon of the Larmor 
frequency. And now knowing g for any system will tell us how our states evolve in time — we just take our two basis 
states as the |+) and |—) of some “spin,” and we can then describe a physical picture of how the state is evolving in 


time. 
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24 March 30, 2020 


Recitations are now online over Zoom. This is a bit unusual, but we'll see how we can proceed for the rest of the 
semester. 

Because we have an exam in a few hours, we can discuss a few topics related to it. First of all, the test is going 
to be 3 hours long: the test opens at 12pm Eastern and stays open for 24 hours, and we have to do It in one sitting 
(so we should pick an uninterrupted time). Professor Zwiebach doesn’t think it'll take us the whole time, but we can 
use the full time if we keep reviewing and checking our answers. 

We might worry about partial credit on this exam — the computer will grade everything as being right or wrong. 
But we'll alleviate this worry by dividing the questions in a lot of pieces, where some pieces are quite independent from 
the parts before. So we should still sometimes be able to do later parts of the problem even if we can’t do an earlier 
part. 

The formula sheet is available for us — we can print it or have it on a different screen. It’s not an exam with heavy 
use of the formula sheet, but it will be useful to have as a reference point. Usually, MITx will give us a red cross or a 
green button — this time, we will not know if we got the right answer or not. But when we enter a short answer into 
a box, it'll check the syntax to make sure it’s valid. Until we finish the exam, we can change all of our answers — it’s 
always better to save them all the time. (All of this is in the information page before we go into the exam.) 

The exam will have 4 problems: one multiple choice, one about the “mathy” things we've discussed in the early 
part of the class, and two on things like oscillators, the variational principle, and spin states. A few topics that have 
been discussed recently — time evolution, Heisenberg operators, coherent states, photon states, and two-state systems 


— won't be relevant to this exam. Let’s do a little bit of review: 


+ We've studied a lot about spin-1/2 systems, discussing Pauli matrices and their properties, explicit formula for 


spin states, and so on. In general, when we take the operator 


G-n= 01N, + 02N2 + 03N3 


for a vector 7 of unit length, we'll have eigenvalues of +1 corresponding to the two spin states |n; +) and |n; —). 


+ We've also discussed the matrix representation of an operator: when we act on the kth basis vector, we learn 


about the kth column of the matrix representation. 


« An important idea was that of an invariant subspace — an example is that eigenvectors of our linear operators 
generate one-dimensional invariant subspaces. There may be others as well — in general, we find an invariant 
subspace U by looking at a general vector v € U and seeing if Tv is still in U. An important example of this 
is that degeneracies in the spectrum (that is, the set of eigenvalues) correspond to higher-dimensional invariant 
subspaces. For example, if there are three linearly independent eigenvectors with the same eigenvalue A, those 
three eigenvectors generate a three-dimensional invariant subspace, corresponding to the set of vectors where 
Tv = »v. (Basically, in this subspace, every vector gets multiplied by 4, so every vector in the subspace is an 


eigenvector. ) 


On the other hand, if we take two eigenvectors v1, v2 with different eigenvalues, their span is indeed an invariant 
subspace — any linear combination of vy and v> will still be a linear combination of vy and vo after we apply our 
linear operator T. But this is not a nice invariant subspace, because not every vector will be an eigenvector! (In 


fact, only those along the direction of v1, or v2 will be.) 


And we can say something more concretely by thinking about our operators as matrices: suppose that the span 


of e; and e is an invariant subspace in our vector space V. If we let a matrix (operator) act on our vector space, 
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then this condition tells us something about the first two columns of the matrix: they must have all zero entries 


except for the first two rows. 


One important topic is that of diagonalization. We discussed that certain classes of nice operators are unitarily 
diagonalizable — these are the normal operators, and they include the Hermitian, anti-Hermitian, and unitary 
operators. When talking about trying to diagonalize different operators, it’s important that they commute. An 
example of this is a system with a (spherical symmetric) central potential: we often need to find a set of states 
for each value of 2, and the spectrum is degenerate in @ (there are multiple states with a given value of 2). Then 
if we want to distinguish those states with a given £ (corresponding to the energy) by comparing their values of a 
different operator, we need to make sure the new operator commutes with the first one, so that diagonalization 


of one operator doesn't mess up diagonalization of the other one. 


One idea that’s worth reviewing is how we deal with degenerate sets of eigenvalues when trying to simultaneously 


1 0 0 
diagonalize two different matrices A and B. For example, if A is diagonalized and looks like |0 2 OJ| (so there 
0 0 2 


r» 0 0 
is a degeneracy with the eigenvalue A = 2), the matrix B will look like |}Q0 * «|. And to turn the x parts into 


QO *« x 
a diagonal matrix, we apply another unitary transformation to the invariant subspace for A = 2. 


The variational principle says that we can estimate the ground state energy by calculating the expectation of the 


Hamiltonian in a test function: this tells us an upper bound on the true ground state energy. 


The trace of a matrix tr(A) is an important linear operator: it is formally defined as 
tr(A) = SS Aii: 
: 


the sum of the diagonal matrix entries. It turns out that this definition doesn't depend on the basis that we use 
— even though the diagonal entries can change between different matrix representations, it turns out that the 


sum of the diagonal entries is always constant. Note that when we have an orthonormal basis, we can also write 


= Ds (iJAli) . 


The trace is linear: tr(aA) = atr(A), and tr(A+ B) = tr(A) + tr(B). We also have the nice property that 
tr(AB) = tr(BA), which gives us something called cyclicity of trace: 


this trace in bra-ket notation as 


tr(ABCD) = tr((ABC)(D)) = tr(DABC). 
This also means that the trace of any commutator is zero, because 
tr({A, B]) = tr(AB — BA) = tr(AB) — tr(BA) = 0. 


For complex-valued vector spaces, a more intrinsic way to think about the trace is that it is the sum of the 
eigenvalues: it’s not true in real vector spaces, because we have cases where operators don't have any eigenvalues 


at all. 
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25 Two State Systems, Part 2 


Recall that last time, we talked about general Hamiltonians for a two-state system: they're specified by four real 


numbers and can be written in the form 


H=gol+9-F=q!+G,-5S. 


Here, Ww, = 7g is the Larmor frequency — we mentioned last time that spins will rotate with an angular velocity |w,| 


around the vector d,. (In the case of a magnetic field,this looks like = 7S.) There are at most two energy levels 


in a two-state system, because we have a two-dimensional vector space: then there are two energy levels go + g, 


corresponding to the spin states |f7;--). And we can think of any system specified by such a Hamiltonian as having 
two basis vectors, corresponding to “spin up” and “spin down,” even if the system itself has nothing to do with spins! 


With that, we'll start talking about the ammonia molecule. 


Fact 215 


Ammonia is a molecule with chemical formula NH3 — it’s a colorless gas used as a fertilizer or in cleaning products. 


Its shape is a flattened tetrahedron with a nitrogen atom at one corner and a base of three hydrogen atoms. 


If the atom were totally flat, the angles N-H—N would be 120 degrees, and if the atom were a regular tetraheron, 
that angle would be 60 degrees (because we'd have an equilateral triangle). The angle turns out to be 108 degrees in 
this particular molecule. 

And we can think of this as a two-state system, because the nitrogen atom can be “up” above the hydrogen base 
or “down” below it. Thus, there are two configurations of this system — both states are stable — and we can think of 
this as having a potential V(z) in the z-direction (where the equilateral triangle base of hydrogens is in the xy-plane) 


that looks something like this: 


We'll try to describe this as a two-state system, and we'll need some notation for that. Let our basis states be 
|1) = |t), corresponding to the nitrogen atom being up, and |2) = |), corresponding to the nitrogen atom being down. 
We can now write down a Hamiltonian for the system — this potential does not correspond to a two-state system 
because there can be many energy eigenstates, but we can use our quantum mechanics intuition here. The ground 
state is some wavefunction with two peaks (at the two points where V = 0), and the first excited state looks basically 
like that but with one of the peaks flipped, so that we have an odd function. 


If the middle barrier is high enough, we can assume that the energy levels are close to each other — let's call the 


1 0 
ground state energy Eo, and let’s try to write down a Hamiltonian. If our basis states are |?) = A and |) = a , our 
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Fo 


Hamiltonian can't just look like : there aren't two degenerate energy eigenstates for this one-dimensional 


Eo 
potential, so we need more to describe the physics here. Instead, we'll try 
(p= Fo —A | 
—-A £ 


where A > 0. (The choice of sign doesn’t change the physics that we're using: we could get a positive A if we replaced 
|2) with — |2).) But now |t) and |{) are no longer our energy eigenstates, so we should try to figure out how this 


compares to previous models. First, let’s write the Hamiltonian as 
H = Eo! — Ao. 


Comparing this to our generic Hamiltonian above yields the g vector in the x-direction (because we have the matrix 
01) with magnitude A: 
g=—Aé, => g=A. 


To find the ground and excited states, we just need to find the eigenvalues and eigenvectors of this matrix, and we'll 


see if this matches the actual physics of the system. It turns out that the energies are g) + g = Eg £ A, and the 


energy gap here is 2A. So that’s a good first step: we have two energy eigenstates, Fg + A and Ey — A, and they 
1 
V2 


; 1 1 al ; . : 
correspond to eigenvectors of Va | and ] , respectively. In terms of our basis states, this means that the 


energy eigenstates are , 
— 2 


So now we can think back to how this relates to our spin states: in this nitrogen atom, only one direction matters 


IE) = = (th) -), Is) = (MN) +18). 


(the z-direction, corresponding to up and down orientation of the N atom), while for our spin states, three different 
dimensions matter. So we need to be a bit more abstract: remember that the vector g = —Aé, points in the 
x-direction, and the excited and lower states correspond to |f;-+) and |; —), respectively. So the excited state is 
supposed to correspond to a vector in the +f direction, while the ground state should correspond to a vector in the 


—f direction. So in spin language, we can say that 
|E) = (I+) -|-))/V2, 1G) = (4) + 1-))/v2. 


And indeed |+) + |—) is along the positive x-direction in our original spin state model (pointing in the same direction 


as g), and |+) — |—) is along the negative x-direction (pointing opposite from g). 


Fact 216 


Since this energy gap is 2A, the transition energy can be written as 


2A = hw, 


and this corresponds to a frequency of about v = 23.827 GHz, which corresponds to a wavelength of about 1.26 


cm. 


So we haven't introduced too much complexity, and we already have a nice model of the ammonia molecule. 


Example 217 


How does the |t) state evolve in time? 
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(Remember that this is not a stationary state, because it is not an energy eigenvector.) The fastest way for us to 
do thi in principle is to think of this with spins, though it is a little painful. Recall that the Larmor frequency vector 
@, points in the direction of J, so we have a starting vector |t) which begins in the z-direction and precesses around 
g, which is in the —X direction. So the state rotates in the yz-plane, and now we can calculate a little bit by writing 


this in terms of energy eigenstates. The initial state is 


1 
,O) = |t) = CE) +1|9)), 
|, 0) = |t) Jal )+|G)) 
and we know how the energy eigenstates evolve in time (using the unitary time evolution operator e~!*/", we can 


then convert from |E) and |G) to the up and down states (which is the intuition that we wanted in the first place). 


The final result is that , r 
é t ee 
w(t) = efFt/n (cos oe It) + isin N)) 


and we can also use this to find the probabilities of being in the up and down states: they're Just the squared magnitudes, 


or cos? “A and sin? A respectively. In other words, this nitrogen-up molecule will rotate even if we don’t do anything, 
and this is happening 23 billion times a second, because it’s not in a stationary eigenstate! 
Note that the frequency of rotation here is 4 while we have a Larmor frequency of 29 — mT the frequency of 


the photons that we found above. But there’s no contradiction between the Larmor frequency and the frequency of 
rotation: remember that in a spin state we had an expression with a cos g. So the physical angle of rotation changes 
twice as fast as the angle corresponding to the up and down ket vectors, and this is the same confusion with the $ 
factors that we had when we first saw spin states! 

We'll now move on to a different time-dependent problem, that of nuclear magnetic resonance. This problem 
begins when we have a magnetic field with a large component Bg in the z-direction, plus some smaller magnetic field 


rotating with some angular frequency w in the xy-plane. In other words, we have 
B = Bo2 + Bi(cos(wt) — sin(wt)9). 


Our goal is to see what spins do in this field — since B is time-dependent, it’s possible that H is time-dependent. We 
know that 
H,(t) = —yB(t)-$ =-y¥ [By Sz + BS, cos(wt) — Sy sin(wt)] . 


And indeed H is time-dependent, and in fact the Hamiltonian doesn’t even commute at different times! (Sometimes 
we have S, and Ss and at other times we have S. and Gos) So we need to figure out this problem in a new way. 
We'll start by trying to get the main intuition for what's going on. Our Schrodinger equation is very complicated 


— it has a time-dependent Hamiltonian, and we have the equation 
(AO, |p) = H |p). 


We'll try to change the Hamiltonian while keeping the physics: one way we can do that is by trying to apply a unitary 
transformation U to our states |w), and we'll hope that the Hamiltonian on these new states |W’) will simplify the 
Hamiltonian. Unitary transformations are basically just a change of basis unless the unitary transformation has time- 
dependence, and then we even mess up the 0; term on the left side. But this is our only real chance of getting a 


time-independent Hamiltonian of the form UTH,U, so we'll go ahead with this idea. 


Example 218 


Suppose we start with a system with a Hamiltonian of 0. 
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Since we have a rotating magnetic field in our original problem, we'll think about what happens to our physics in 
this “nothing is happening” system if we have the xy-plane rotating with angular velocity w. In our “nothing” system, 
H, = 0, and any spin state stays in place. But when we jump into our rotating frame, all of the spin states that started 
off being static are not rotating — they're precessing around the z-axis! So there is some nonzero Hamiltonian in 
our rotating frame: it should be such that the spins rotate around the z-axis with angular velocity w, and this is done 


by the unitary (rotation) operator 
U= oe iwtS2/n 


Thus, the rotating Hamiltonian must be 
Hr = w5, 
(to make the unitary operator erin), Let's now think about how our Hamiltonian changes when we don't just start 


with 0, and we'll do this with a different calculation: 


Example 219 


Suppose we have a rotating wavefunction defined by 


Wr) = Uy). 


What is the Schrodinger equation for Wr if we know the equation for w? 


We start with the usual equation 
it |W) = Hs |p) . 
Evaluating the left hand side for Wr, we have 


INO: |r) = ihOz(U |p)) = 1A(OzU) |) + HAVO: |p) 


by the product rule, and then we can simplify the first term by adding a UU between U and |w) and simplify the 
second by applying the Schrodinger equation: 


= ih(O,U)Ut |e) + UH. |W) . 


But now we put another UTU in the second term and we'll find that we have a new Schrodinger equation: 


iO; |r) = if (UHsUt + (0,U)U*) |p) 


This is essentially the “rotating Hamiltonian” that we've been trying to figure out, and this is what we hope is simpler 
than our original H.! The first term UHsU? corresponds to a “similarity transformation” of the Hamiltonian, and the 
second term has to do with the time-change affecting the original left side of the Schrodinger equation. Recall the 


above argument: when H, = 0, we want our new Hamiltonian to be w5,, so we'll pick a U such that 
ih(O,U)Ut =wS, => U =e wtSe/f, 
which is indeed the rotation transformation we had above! So our new Hamiltonian is 
He = UH.U' + ih(O,U)Ut, U = eWtwtSe/0, 


and our new state is 
|W, t) = ele!” lab, t) 
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by taking the inverse of the unitary operator. So we now have a problem for |W) with a Schrodinger equation involving 
a Hamiltonian Hp where the second term is just wS,: our hope now is that we have a time-independent Hamiltonian 


in this new rotating frame. 


Example 220 


Before we look at time-independence, here's another way we can find the Schrodinger Hamiltonian for |we, t), 


our rotating wavefunction. 


We can say that 
lve, t) = U(t)Us(t) |p, 0), 


where Us is the unitary operator for the spin system itself (associated to the ordinary Hamiltonian). Then UU, is the 


total unitary operator that evolves the state, and then we know that 
Hp = ihO,(U(t)Us(t))(U(t)Us(t))", 


because the Hamiltonian associated to any unitary time-evolution operator A is if(O,A)A'. Then we just evaluate the 


derivative by the product rule — this yields 
= ih(O,U)UT + ihd,U(t)ih(O,U,(t))ULUT(t). 


And now the middle of this second term is the Hamiltonian associated to U,, and either way this means we can write 


down the following formula which summarizes everything: 


Hp = Hy + U(t)Hs(t)UT(t) |, 


where Hp is the rotated Hamiltonian, Hy is the Hamiltonian associated to the unitary operator U, and H, is our 
original Schrodinger Hamiltonian. 
iwtS,/n 


So let’s bring this back to our original example. We chose U = e— so that Hy is just wS,, and now plugging 


everything in (including our original Hamiltonian), we find that 
Hp = wS, = eWt:/" |_»( Bo, + By(cos(wt)S, — sin(wt)S,))] et=/*. 


—twtSz/M (also called a similarity 


To simplify this further, the BoSz term can go outside of the conjugation by e 
transformation) because both just have S-S, so they commute. This contributes a term of —yBoS:, and then the 


rest looks like 


— 7B, eo iwtSe/h [cos(wt)S, = sin(wt) Sy] eiwtSz/n | 


There are two ways we can simplify from here: since we have two exponentials, we can expand them and multiply 
or we can use the formula for e*Be~“*. But here's another idea: we know the function U(t), so knowing w means 
that we know Wp as long as we can make Hp less complicated. So we'll call that boxed term m(t), and we'll take its 
time derivative (this is a good idea if we're in a rush). Then we use the product rule: the derivatives of the outside 
terms bring terms down from the exponentials, which gives us a commutator, and the other term Is just evaluating 


the derivative in the middle: 


S M(t) = eiwtS./n (-= [S,, cos(wt)S, — sin(wt)Sy] — wsin(wt)S, — u(cosut)$,) gitSe/h 
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Now we just evaluate the commutator with our known relations: 
wt § IW ,. A m ie 
=e iwts2/h (-Fuins, cos(wt) + iAS, sin(wt)) — wsin(wt)S, — u(cosut)$, ) tae 


and now all of the terms cancel, and we're just left with zero! This means that M(t) has no time dependence, since 
its derivative is zero, and now we can just evaluate it at t = 0. Then the exponentials disappear, and the whole boxed 
expression is Just Sx! 


So plugging everything back in gives us our final rotated Hamiltonian: 


Hr = (-YBo +w)S, a Bi Sy 


This now just has two pieces: the S, coefficient got an extra w term from the rotation, and the rotating xy-magnetic 
field just became a static one in the x-direction (as it was at time 0). 


To make this look nicer, Wo = YBo is the Larmor frequency for Bg, so this can also be written as 


= iY IG a “) Se + B.S,| == E (1 _ <) S, + B.5,| : 
Y Wo 


So now we can think of this Hamiltonian as being in the form —yBp-S, where the magnetic field points in the direction 
= WwW ™ m 
Br=Bo (1-S) eras 
Wo 
So we can finally answer our additional problem: we wanted to know how a state time-evolves, and we just have 


Iw, t) =i lwe, t) = iwtSz/N .—iHrt/h Iw, 0) 


where the whole point of all of this is that the second exponential is very simple because Hp is time-independent! 


Plugging in the value we know for Hr, this gives us the equation 


|p, t) _ eivtS2/N piy(BrS)t/n ly, 0) 


(Remember that the states wr and w are the same at t = 0.) This is the complete solution for our rotating spin 
problem! 


With this, it’s time for us to talk about applications. In practical examples, we always have By, < Bo. 


Example 221 


Let’s look at the case where Ww < Wo. 


The wo frequency is the Larmor frequency, and because Bop is very large, this means Wo is also very large. So it’s 
reasonable for w to be very small (the B field rotates very slowly compared to the rotation that it is creating) — in 
such a case, we can approximate 

Br & BoZ + BX. 


So this magnetic field is mostly along the z-axis, and let’s also look at the case where the spin is up in the +z 
direction at time t = 0. This magnetic field will rotate our spins around the axis of Br, and since Bp is very close 
to 2, the path is a small cone near the z-axis. But we should make sure not to forget the elwtSz/h term, which is also 
producing a rotation around the z-axis. In this case, since w < Wo, the “rotating cone” behavior is much faster than 


the precession of the whole process around the z-axis. 
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Example 222 


Let’s now do the resonance case, where W = Wo. 


Basically, we know what Wo looks like for the spins themselves, and then we set up our system so that w completely 
lines up with Wo. Then the 2 component of the magnetic field disappears: we just have Be = B,X, which means that 
our spin state now precess around the x-direction. (Since we're actually missing a negative sign here, the spin rotates 
around the —X direction.) Now B, < Bo, meaning that the rotation operator eltS:/M will rotate our spin around the 
z-axis faster than we can precess around the x-axis: thus we create a spiral! 

And now as the spin fills out the spiral, it’s an interesting question to time the signals: we care about when the 


spin is perpendicular to the original direction (so in the xy-plane for the first time). To do this, we’ll choose 


Tv 
wWyl = 5? 


1 
27By 
spin has gone from the x-axis to the equator of the sphere, and the B, term is negligible for a while. (As an exercise, 


where w , = YB, is the Larmor frequency. Thus T = is called the 90 degree pulse: after this much time, the 


it’s worth figuring out the spiral equation that comes out of all of this!) 

This turns out to be the technique used for magnetic resonance imaging (MRIs), which is one of the interesting 
applications of quantum mechanics to technology. This device goes beyond what we can do with an x-ray: basically, 
a person is put inside a solenoid with a magnetic field of 2 Tesla. (It’s not dangerous, but if we forget metal or have 
iron ink in a tattoo, that can cause some problems.) 

The purpose of this MRI is to figure out the local concentration of water. The magnetic fields from the solenoid 
interact with the magnetic dipole moments in the protons of hydrogen atoms, and these protons get roughly aligned 
to this Bo magnetic field. (Not all of the protons get aligned — maybe just one in a million — but that’s enough.) 
But then this 90 degree pulse is sent, so the proton will start spiraling, and this rotating dipole moment will generate 
electromagnetic waves. The MRI's detectors then picks up this signal: the strength of that signal is proportional to 
the concentration of water (or other kinds of liquid) that we have. 

This is useful because we can compare signals from different areas: we can then distinguish different kinds of tissues 
(some have more water than others). 

But this rotation of the proton has a relaxation time 7> for the rotation (in which the spin interacts with other 
spins), and there is also a time 7, that it takes for the spin to return to its original position (in which the atom interacts 
with a set of neighboring atoms). These two measurements, 7, and To, are very good for our applications, because 
we can measure any liquid’s 7; and T> and compare it to the numbers that we measure in our own MRI! For example, 
the value of T2 is good enough to distinguish white matter, grey matter, and fluids in our brain. 

And one final note: an MRI often makes large noises when we first go into the machine. This comes from gradient 
magnets: the value of Bo is adjusted as a function of position, which also changes the wo for our spins. This technology 
has gotten sophisticated enough that we can get spatial resolution: we can tell where a signal is coming from in our 


body, up to a resolution of half a millimeter. (This is a junior lab experiment as well!) 


26 April 1, 2020 


We've now finished with the first exam — it was mentioned in an email what we need to do to pass the class this 


semester with a PE grade (60 percent or above). There are lots of different parts of this class, and that boundary can 
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be possibly lowered but not raised. 

In the next few days, we'll do an anonymous poll to see how we felt about the exam — feedback is always appreciated! 
Doing things online definitely requires different skills, so there’s something new for all of us. The second midterm and 
final will be in this kind of format as well; the idea is to mitigate any difficulties before those tests. 

In this recitation, we'll talk about some of the concepts needed to move forward with this class. There's several 
things we'll have to discuss, but let’s talk about the main subject. Since we last discussed recent material, we've 


explored a lot about unitary time evolution: we know that we can write the wavefunction at a time t as 


|p, t) = U(t, to) |W, to) . 


It is important to emphasize that this works for all t and to, and in fact we can use any wavefunction w in our system 
and we'll have the same unitary operator U. 


There are a few other important properties that we should remember about our unitary time-evolution operator: 
+ U(to, to) = / is the identity operator. 


+ Composition of times works as nicely as we'd like: 
U(tz, tr) U(tr, to) = U(tr, to). 


(Going from time to to t; to to is the same as skipping over the middle time.) And we should remember that if 
we plug in to = to, we have found that the inverse of U(ty, to) is U(to, t1), which makes sense. This can also be 


written (because a unitary operator has Ul = U~?) in the form 
(U(to, t1))' = U(th, to) 
¢ We can find the Hamiltonian associated to the unitary operator U: 
OU 
H=ih| —(t, t to, t). 
! (Fe , 0)) U o, t) 


(We might have seen this last term usually written as UT(to, t).) Usually, we have a lot of intuition on how 
to write Hamiltonians, but we have less intuition on how to write unitary operators, so we often go from H to 
U. But if we know how the system evolves in time, then we can use this tool to reconstruct the Hamiltonian 
(go from U to H)! And we can use the last equation as a differential equation for U: multiplying both sides by 
U(t, to) yields 

inst to) = HU(t, to). 


¢ There are some important special cases where it’s easy to solve for the unitary time evolution operator. If H is 


time-independent, then we can write 
U(t, to) = eT aM) 


and if H isn’t necessarily time-independent but still commutes at different times, we have 
U(t, to) = ero MOA 


This property of the Hamiltonian makes it much simpler, so it’s a good thing. The reason this formula doesn’t 


work in general for a Hamiltonian that doesn’t commute in time can be seen if we try to write out this expression 
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as an exponential: then we have terms of the form 


(-; : Mieyae’) (—t [ meyae’) (= [ Me)de) 


Then when we take the derivative and use the product rule, we get H(t) terms replacing one of the terms in 
this product. But in order for us to factor this nicely, we need to be able to move H(t) past the integral, so we 
need H to commute at different times! The main idea is that in general, we do not have 

Fame) — ity 
dt 


for a matrix M(t). 


Unitary operators are used to help us transition from the Schrodinger picture to the Heisenberg picture of quantum 


mechanics. The idea is to start with an expectation value (or a matrix element if we'd like to think of it that way) 


(Wr, t]As| Wa, t) . 


When we think about Schrodinger operators, we should just think X, 6, and so on: usually these are time-independent, 


and we can have Schrodinger operators with time-dependence only if we write in a specific time t. By time evolution, 


we know that we can write the above expression as 


(1, 0|U'(t, O)ASU(t, 0)|W2, 0) . 


We call this middle term the Heisenberg operator: it’s a similarity transformation of our usual operator As. This has 


a few nice properties for us to remember: 


Au = As at time t = 0. 


The “algebra of operators” is preserved, because of the way that U is acting on our operators. For example, the 
identity operator and the commutators stay the same. Specifically, if we have a Lie algebra, then [A;, Bs] = Cs; 
means that [Ay(t), Bu(t)] = C(t), and also that (AB) = AyBy. We can check this ourself: 


(AB) = U'ABU = UTAUU'BU = Ay By. 


A Schrodinger Hamiltonian of the form H(, X; t) gives rise to a Heisenberg Hamiltonian Hy = H(fx(t), Xy(t); t). 
And this statement is true for any operator that depends on just these variables: basically, to get the Heisenberg 


Hamiltonian, we just plug in the Heisenberg versions of our operators. 


One special property for the Hamiltonian, though: if [H;(t), Hs(t’)] = 0 — that is, the Schrodinger Hamiltonian 
commutes at different times — then we actually have Hy(t) = Hs for all t. In other words, they are identical 


operators — they are the same function, and in particular this means Hy doesn’t actually have time-dependence. 


We have the important Heisenberg equation of motion 


dAy «4 _,, { OAs 
h— = [Ax(t), Hu(t hi) —]| . 
1D ae [Au(t), neon ein (3) 
H 
The last term is often irrelevant — It means we're taking the Heisenberg version of the time-derivative of our 
Schrodinger operator (note that this is not the same as the time-derivative of the Heisenberg operator). And 


often As doesn’t depend explicitly on time, so that last term just goes away. And an explanation for 


An important example of this Heisenberg formulation is the simple harmonic oscillator. We derived that the 
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Heisenberg operators for X and 6 look like 
x 23 La, Siok : “ Ras 
Ry = & cos(wt) + my Psin(wt), By = Pcos(wt) — mwXsin(wt). 
We can check that this gives us Heisenberg operators for the creation and annihilation operators 


ace SAS ee. 


This is fairly fundamental, so we should make sure we can follow along with all of the logic here. 


We'll close with a few remarks about coherent states: this concept arises out of the translation operator 
Toe = e7 ibx0/h 
0 , 


which is a unitary operator. (In general, any operator of the form e/4, where A is Hermitian, is a unitary operator.) 
So T,,’s inverse is also its adjoint: 
(Tye)! = Togs 


and we can also combine exponentials because they all commute: 
Tx Tx = Txotx1- 
There are a few ways of justifying the name “translation operator’: recall that 
T&T =X + ol, 


and therefore 


(X) Tb = (Kw + Xo, | Txo |X) = |x + X0) 


(Note, though, that (x|7,, Is (x — Xo| instead, because we can take the dagger of the above boxed expression and 


then replace xo with —xo.) With this, we define the coherent state with label xg to be 
[%) = e/ 0). 
We can verify that the wavefunction is just the ordinary ground state wavefunction translated by xo: 
Wx (x) = bo(x — xo). 
This is because the wavefunction is defined to be (x|xo), and then we just expand this out with the definition: 
= (x|Txq10) = (x — X0|0) = bo(x — Xo). 


We've only gotten through some of the new ideas, and we'll continue to work towards catching up on our concepts 


over the next few recitations. 


27 Maultiparticle States, Part 1 


We're now shifting to more complicated systems: for example, suppose we have two particles in a system (it doesn’t 
matter yet whether they’re distinguishable or indistinguishable; that’s more of an 8.06 topic). Then the two particles 
can be described with their own physics: particle 1 might have a complex vector space V for its state space, along 


with some operators 71, T2, and particle 2 might have some other vector space W, along with some operators Sj, So. 
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(These operators are things like position, momentum, and so on.) Our first question will be how to describe the 
composite system — that is, the system of the two particles together, especially when the two particles can interact 
with each other. 

Since particle 1 is described by some v € V, and particle 2 is described by some w € W, it’s reasonable to imagine 
that (v, w) describes the composite system. It turns out that this is a bit naive — it doesn’t represent everything we 
want in our system just yet — but we do need to encode the two systems together. 

So we'll use a specific notation: we'll encode those pairs of vectors as v® w, where ® represents a tensor product. 
Here, we're not multiplying the two vectors in any obvious way — we're just saying that this is an object that puts 
together our information from V and from W. This object v @ w is going to be an element of the new (complex) 
vector space V @ W, called the tensor product of the two vector spaces. 

Let's try to extract some properties for this object that we’ve just introduced. We know that states can have 
constants in front of them, so we'll allow ourselves to put constants in front of the v: this gives us (av) @ w. We 
want to relate this object to v ® w — otherwise, we have a much larger space, and we get what's mathematically called 
a direct product. Essentially, we don’t want (av) @ w to be linearly independent to v @ w (since av and v are the 


same state in V), so we're going to say that the as can come out of the product: 
(av) @w=a(v®w)=ve@ (aw). 


(Notably, these don’t come out with a complex conjugate.) We can impose this property on the object we defined, 
and now we can make some more progress: if vj @ w, and v2 ® We are two vectors in our tensor product vector space 


V @W, any linear combination of them should also be in the space: 
a( vy & W;) + B(v2 & Wo ) EVeW. 


Notice now that we can’t just treat our v and w separately from each other in the tensor product, because quantum 
mechanics now seems to require us to be in a superposition between (v, ® w1) and (v2 ® wo), and there’s some kind 
of connection between the two particles in general! This is where entanglement comes from, and we'll see that soon. 


And there’s one more constraint we need to impose: for the sake of linearity, we'll also say that 
(1+ vo) @w=VUy@w+vww. 


The reason for this is that both sides of this equation represent the first particle being in a superposition of one of 
two possibilities, while the second particle is in some specific state. Again, this is different from the direct product, in 
which we just put the two vectors side by side — in that case, we would add the two ws together as well, which isn’t 


what we want to do here. And similarly, we'll want to say that 
VO (Ww t+ w2)=V@wm+VvVQ WwW, 


and now we have all of the axioms we need for our tensor product: just read off the equations above. 
To add a bit more intuition for this, the space V ® W is spanned by vectors of the form v; x w;. Specifically, if we 
choose a basis (€1,--- , @n) for V and a basis (f{,--- , fm) for W, then we have a basis for V © W of the form e; ® ff. 


Since there are mn such vectors of this form, we multiply the dimensions for a tensor product, not add them: 
dim(V @ W) = (dimV)(dimW). 
Indeed, because of the axioms that we introduced, we can get any vector in V on the left and any vector in W on the 


right! (This, for example, wouldn't have been possible with a direct product.) 


137 


Fact 223 


There are a lot of subtle facts about this tensor product, so it might feel at some points that we are taking a long 


time to explain things, and it might feel at others that something is confusing. 


We'll now try to introduce operators to our spaces V ® W. Say that T is an operator on V, and S is an operator 
on W: we'll define an operator T @ S € L(V @W), and let's see what properties this must have. It suffice to show 
how T ® S will act on any element of the form (v ® w), and then we'll be able to extend it to any superposition of 
such vectors by linearity. We're going to make a definition, but it won't have very much to do with anything else we're 


talking about today: the most natural way is to say that 
T @S(v@w) =(Tv) ®@ (Sw). 


In other words, everything acts in the space where it can, and there isn’t much more to say here. Since T and S are 
linear, T ® S will be linear as well. 

But now suppose 7; € L(V), and we want to get an operator on V ® W without having an operator on W. 
Then we'll need to upgrade our operator by just using the identity operator on W: we end up with the object 
T,@1€ L(V @W). (Similarly, we can upgrade a vector S; € L(W) by turning it into /@ S;.) And now one important 


idea is that these two operators will commute: 
(Nh @N1@S)(v@w)=(1@!l)(v@ Siw) = Tv @ Siw, 


and similarly 
(1 @ S1)(Th @ /I)(v @ w) = (1 @ S1)(Tiv @ w) = Tiv @ Siw. 


Essentially, operators that originate from different particles still commute — “they don’t know anything about each 


other.” So this is helpful, because writing the Hamiltonian of the whole system Hy is just 
Hr =H, @1+1@ Mo, 


where H,, H2 are the Hamiltonians of the original two systems. 
We'll now show an example for all of this: it’s famous and important, because it’s how we can think about 


combining angular momenta. 


Example 224 


Consider two spin 1/2 particles: the first one has basis states |+),,|—),, while the second has basis states 


[eyo =) 


To form the tensor product, we need the four basis vectors where we take the product of the basis vectors: our 


space Is 


span (|+)1 @|+)o, |+)1 @|-)o, |-)1 @l+)2. |-)1 @l-)2)- 


In other words, two spin states form a four-dimensional complex vector space: a general state in this space looks 
like 


Ip) = a1(|+); @|+)2) + @2(|+)1 @ |-)2) + @3(|-)1 @ |+)2) + a4(1-)1 @ |-)o)- 


We can do a simple computation with this: let’s act with the “total z-component of angular momentum” on this state 


w. This total z-component is just the z-component of the first particle's angular momentum plus the z-component 
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of the second particle's angular momentum, so our operator is 
SF = SM 450 = SM @I1+1@ $0). 


Essentially, we're constructing a new operator ou on the new (larger) vector space. And now we can calculate this 
term by term: since §) @/ acts on our state, it acts on each term of the vector |qw). We can pull out the constants 


and then just apply §M to the v-vectors: thus, 


(SM @ 1) |p) = 01S, |+) ® |+) + a2, |+) ®|-) + 035, |-) ® |+) + a4$z|-) @|-). 


(We've dropped the subscripts for convenience.) And now we know that S,|+) = a |+), so the number comes out of 


the tensor: 


(50 @ 1) Wb) = 2 (oat) @1+) +o2/4) @| ) — a3|-) @ |+) — a4|-) @|-)). 


We can do the other one pretty quickly as well: 


(1@ 8) fb) = 2 (oa lt) 14) a2 |+) ® |-) +3 |-) ® |+) — a4 |-) ®|-)). 


So if we add these together, we get the total operator ST, and thus 


ST WW) = 2 (2an|+) @ [+) +204 |-) ®I-)) 


And now any state with total z-angular momentum st = 0, we must have a, = a4 = 0 (because those two vectors 
on the right hand side are independent). We will see soon that there is a state whose total angular momentum in all 
three directions is zero. 

One thing that we haven't said very much about is the zero vector of this tensor space. We know that there is a 


zero vector in V @ W, and in this case, it looks a bit more complicated than usual. Consider the vector 
0@w, weEwWw. 


This is actually the zero vector for any vector w;, and similarly vy; @ O is the zero vector for any v, € V. This is because 
we can pick a= 0 in the statement a(v ® w) = av @ w: then the left side is just 0, while the right side is 0@ w. This 
means that having 0 in either input guarantees that we have the zero vector. 

We can now try to get numbers out of our tensor space: specifically, we can define a new inner product. As always, 
we should define this object to our best ability and hope it satisfies the axioms we want: we'll require the inner product 


to have 
es ajjVi @ Wj, S- bpgYp ® va) = s. aij S- bpq(Vi @ Wj, Vp @ Wa). 
Pq ij BG 


Essentially, we're assuming the linearity on the right inputs and anti-linearity on the left-inputs, just like our usual inner 
product. To get a final number, the best thing for us to do is to use an inner product from v and an inner product 
from w: 


(Vj @ Wj, Vp ® Wg) = (Vis Vp)v (Wj, Wa) w: 


This last step is the most interesting one: we do need to multiply, because setting vi = 0 must yield zero — in that 
case, we're taking the inner product of zero with some other vector. And this does indeed happen here, because 


(Vi, Vp)Vv = 0. 
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Example 225 


Let’s now return to the state 


(Sometimes we put subscripts 1 and 2 for the ket vectors, so we're careful that we're looking at the right vector 
spaces. In particular, I+) & I-) 2) and I-) (2) ® I+) (1) are the same things — commutativity isn't really a problem — 
and then we do care about the labels.) This is an example of an entangled state of spin 1/2 particles — we haven't 
quite defined that yet, but the idea is that we should try to normalize this state. Like with any other vector space, we 


take the inner product of the state with itself: 


(ip, B) = a%a(|+) ® |-) —|-) @ 1+), 1+) @1-) — |=) @1+)). 


But every term inside the inner product is now part of an orthonormal basis for the tensor space: the squared terms 
give (+|+) (—|—) = 1, while the other terms give nothing because (+|—) = (—|+) = 0. (One way to visualize this is 
that we can turn all the kets on the left argument into bras.) Either wway, this means that (w, ~) = 2|a|?, so 


normalizing the state yields 


1 
= gilt) ®) )—|-) @|+)). 


It turns out this state actually is the one with zero total angular momentum (in the x, y, and z directions). This state 
is rotationally invariant — if we apply a rotation operator to this state by rotating both spaces, the state that comes 
out is the same! 

So now we're ready to talk about the concept of entanglement: entangled states are those where we cannot say 
that “the first particle does something and the second particle does something else.” We know that V ® W includes 
superpositions (that is, sums) of ajjv; ® w;. If we're given such a superposition, a good question to ask is whether we 
can write it in the form v, ® w, for some vy, € V.w, € W. If we could say that, then we would know that the first 
particle is in the state vz and the second particle is in the state w,: the two particles are not actually entangled if we 
can factor our state. 

It seems like this is a complicated factorization problem — it might take some time to see whether a state is an 
entangled state or not. (Note that being entangled is a basis-independent problem!) Let's illustrate how this would 


work with an example: 


Example 226 


Suppose V, W are two-dimensional complex vector spaces with bases (e€1, eo) and (f,, f2), respectively. 


The most general state looks like 
ayer f, + ay2e1 fy + aor eof + ag2€ofo. 


There are 2 x 2 = 4 basis states, and we want to ask whether we can write this as (a1e, + a2€2) @ (bi ft + bof) (this 
is the most general way to write a vector in V and a vector in W). Luckily, it’s pretty easy to see when these numbers 


411, 412, 491, Ao9 exist: distributing out, this means that 


41 = aby, arg = abo, a21 = agby, a2 = agho. 
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This gives us a consistency condition for the four equations: note that 
411499 = a,b, abo = a42491, 


t : F 41112) ; 
so four numbers a14, 412, 201, 291 Can only factor if the determinant of the matrix is zero. And with a 


421 422 
quick argument, we can check that whenever the determinant is zero, there exists a solution! So in this case, the 


determinant of the matrix is zero exactly when the two particles are not entangled. 
However, there are many entangled states, and there’s “enough of them” that we can construct a basis of our 
tensor product space such that all basis vectors are entangled states. To do that, we'll use our spin 1/2 system again: 


let V be the state system for a spin 1/2 particle, and consider a two-particle system V @ V. We'll take 


|Po) = ba ale Glee 2 


1 
(4 
V2 
(note that we've dropped the @ symbol, and eventually we're also going to make the ket simpler and just write |++) 


— our notation will evolve as our calculations get more complicated). This is similar to the state that we just built — 


it's already normalized, and we can check that by taking the dual and directly evaluating the inner product. Indeed, 


1 O 
ate * —— E i and the determinant 


this is an entangled state, because we have a matrix representation of Te 
421 422 


is nonzero. 


We still need four other basis states, and we'll write them in the following form: 
|;) = (1 @ aj) |®o) . 


For example, 


1 = (1 01) | 0) = (1 on) aI) 4) tI-)I-)), 


and now the / acts on the first ket, while the o acts on the second ket, and we're left with 


= 7) )+|-)|+)), 


0 1 
because oa; is the matrix i i We can check that ; is orthogonal to gp — none of the terms have both labels 


matching, so the inner product is just zero. Similarly, we have that 


| 


Pal 5h"? | ==) Es 
and i 
IPs) = Fe) ) rly 


We can indeed verify that these are all orthogonal to |®o), and we also need to do the calculation for (%;|%;). But 


this time, we don’t need to do everything by inspection: since the Pauli matrices are Hermitian, we have 
(D;|Oj) = (Po|(/ @ a1) (1 @ aj) Po) 


since “moving from one argument to the other” is the definition of a Hermitian operator in terms of the inner product. 


And now we can make progress using the Pauli identities: operators multiply in the most direct way, so // is just the 
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identity operator, while ojo; is the identity plus a Pauli matrix: it'll be /0;; + /€jj.0~. So plugging this in, 
(D;|Oj) = (Po|(10ij + fEijK7k)Po) , 
and now the /d6;; term just gives us a 6, while a,Po = o% is orthogonal to Po (as we just showed)! Thus, 
(;|%)) = by, 


and we've indeed shown that we have an orthonormal basis of the tensor product of two spin 1/2 particles! We can 


now write down our conventional basis states in terms of the entangled states: 


I) Hy) = (lo) | 3). 


Similarly, we can find the others by a direct inspection: 


1 


|+) | ) = Falla) i|®2)), 


1 
ae a als) + |2)), 


I-) | ) = Fale) Io5)). 


The vectors |®o) ,|®1) , |®2), and |3) are known as the Bell basis for this system. 

We'll now move on to the concepts of measurement and teleportation. Recall that that there is a postulate that in 
an orthonormal basis, we can find the probabilities of our states being along these basis states after a measurement. 
Before our experiment, the state is in a superposition of these basis states, but it will collapse into one of them, each 
with some probability. For example, in the Stern-Gerlach experiment, we picked two basis states, |+) and |—) and the 
device collapses our state into one of those two — what we're saying here is slightly more general. Specifically, if we 
have any orthonormal basis (|e1) ,--- , |@n)), we can construct a machine to measure a state |~) to be in the state 
\e;) with probability | (e;|w) |?, and then after that measurement, we'll be in some state |ex). 

The other point that we should note is that Pauli matrices are Hermitian and square to 1, so they're actually 
unitary, and thus they can govern time-evolution of a system! For example, multiplying a state by 0, doesn't need to 
be very mathematical — because it’s unitary, we can construct a suitable Hamiltonian that evolves the state through 
some time. For example, with our spin states, we can take a magnetic field that exists for a few picoseconds, and that 
will implement o,! Indeed, we can check that 


m(_ eae , eg ne 
gee Seine Bil? = i (Icos 5 + i0;sin =) =0j, 


and thus we've written 0; as the exponential of / times a Hermitian operator, so we can just pick a Hamiltonian H 
such that 5(—1+0;) = a In other words, we can physically realize os with a machine. 
So now we're ready to discuss teleportation: this is a hot topic of science fiction, and it was an idea that was 


impossible classically. But in quantum mechanics, we can do much better, and we'll be explaining this now! 


Fact 227 


This discovery actually came from 1993, so this hasn't been known for a long time. Quantum mechanics is a 


Renaissance of physics in some sense, because now we can do lots of cool experiments. 


The following idea came from Bennett (IBM), Brassard, Crepeau, Jozsa (Montreal), Peres (Technion), and Woot- 


ters (Williams) — it was a big collaboration. Here’s the setup: 
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Example 228 
Two people, Alice and Bob, play a game. Alice has a quantum state — it’s an unmeasured state of a spin 1/2 


particle, a|+) +6 |—). Her goal is to teleport the state to Bob, who is far away. (A spin state in this context is 


also sometimes called a qubit.) 


First of all, we might ask why we don’t just make a copy of our state. The issue is that there's a no cloning idea: 
we can't create a copy of a state like this. Similarly, we can’t measure our state to find a@ and G6, because she only 
has one copy of the state — the Stern-Gerlach experiment would just give us a single |+) or |—), and then our qubit is 
gone. So no matter what, Alice should not measure the state. 

On the other hand, perhaps Alice created this state with a specific Hamiltonian, so she knows what @ and BG are. 
So she could tell Bob those numbers, but if a is some irrational number which requires an infinite string of information 
to transmit, that’s not good either! So instead, we'll try to produce an experiment in which Bob will get the state on 
the other side — our state will teleport. 


Basically, let's let this state space be C: we'll write that Alice’s original state Is 


Ib) =al+)c+Bl-)c- 


The whole idea with teleportation is to use an entangled state here! We can product an entangled pair of two 
particles, where one particle is given to Alice and the other to Bob. Entanglement occurs instantaneously — there’s 
no way to send information through entanglement in general. If we wanted to teleport a person, we'd have to create 
a reservoir of billions of entangled pairs in two different locations, and then we'd have to take these billions of pairs 
and do a bunch of measurements so that every quantum state in the person’s body is measured with some entangled 
state. And that’s essentially what's happening here — Alice will do a measurement so that the particle will become the 


state we wanted to teleport initially! 


Fact 229 


Alice will need to send some additional information as well: suppose Alice has a console with four lights, labeled 


0,1, 2,3. Alice will need to send two bits of information — which of those four lights lit up during the measurement 


— and then Bob will use that information to send B into one of four machines, labeled 0, 1, 2, 3. 


It turns out that after this replication, Alice’s state will be destroyed, but Bob will have a copy of the state, and 
that’s what we'll explain now. We'll start with the AB pair (this explains the name C for the teleported state), which 


is the entangled state 


Ido) a8 = ll+alte psy: 


Even though the particles Alice and Bob have can be very far apart, they're still entangled. So we can take the total 


tensor product of the particles from A, B, and C: this yields 


lho) ap ® (@|+)¢+Bl—-)c). 


Here’s the key point: Alice will do a sneaky measurement with the particles A and C. (Remember that the particle 
A and the state A are different, because A and B are entangled.) Since Alice has these two particles, she can pick 
any orthonormal basis of the two-particle state space, because of the earlier notion that we can measure with any 


orthonormal basis! We'll use the Bell basis for A and C. First, we can rewrite our above tensor product as 


lor) = Fa(lt+)alt)e + l-)al-)e) @(al4)ce +Bl-)c), 
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and now we can multiply everything out: this evaluates to 


= Sle l+alte +e t Bl+)al—)c lta + ol—)alt)el-)e +Bl-)al-)cl-)al 


(the order of multiplication doesn’t matter, as long as we keep the labels). We've written this so that we have 


A-and-C vectors that are orthonormal to each other. However, our basis isn’t entangled between the particles A and 


C yet — instead, we'll mathematically rewrite it using the formulas for |+) |+) in terms of the |W;)s that we derived 


above. This is a bit of algebra, but our result is 


riPo)ac + 1®s)aclal+)e + 5(l@r)qc — /1®2)ac)6 +e 


+5(I®1)ac +12) qc)0I-Ye + 5 (Io) ac — 13)ac)8I-)e 


and now we can collect terms across the ®s to find 


= 5 Ito) ac (I+) e+BI-)e)+5 Ibi) 4c Bl4)e+ee-)e)+5 Ida) ac (ior Ye iB +)e)+5 Its) ac (+56 I-Ye). 


Remember that we haven't done anything yet — we're just rewriting the state mathematically. But something funny 
has happened — the state that we wanted to transmit, which was originally in particle C, not shows up in particle B in 
various funny linear combinations. Specifically, we have a |), term for the |Wo) coefficient (where w was the original 


a|+) + 6|—) that we wanted), and we also have a o3|WV), term for the |W3) coefficient, because o3 is of the form 


1 0 
; | — it gives a +1 eigenvalue for the |+) state and a —1 eigenvalue for the |—) state. Similarly, the other states 


simplify too: we actually just have that the total state is 


1 1 1 1 
Wr) = 5 Ifo) ac |W) + 5 IP) ac 71 |P) 5 + 5 162) ac 92 |) B+ 5 13) ac 73 |P) 6 - 


And now comes the physics! Alice measures in the Bell space of A and C — specifically, we measure one of the four 
basis states in the equation above. The wave function will collapse into one of these basis states — the 0 basis state 
makes the O-labeled light light up, the 1 basis state makes the 1-labeled light light up, and so on. In any case, we 
now have an entangled set of particles |q) ,- which have no memory at all of the original state C, but now B has the 
information instead! Whenever light / lights up, this means that Bob now has the particle in the state a; |) ,, and 
now Bob just needs to apply the o; operator to his state (remember that o? = /). This just means that Bob puts his 
system into the th machine, which has some specific Hamiltonian, and the state will time-evolve into |W) ,. Indeed, 


we've now teleported our state from Alice to Bob — all we needed to send was the information of which light shined. 


28 April 6, 2020 


There are a few announcements regarding grades — the main thing to keep in mind is that each of our tests is now 
15 percent of the total grade, and the homework and lecture questions are now more heavily weighted. This should 


make passing the course more focused on our weekly work, but hopefully we'll still take the tests seriously. 


Remark 230. There’s two ways to approach practice problems: some are better for doing better in exams, and others 
are better for giving physics insight. It’s unfortunately hard to find problems of the former type, except for the unused 
edX problems. But Griffiths (3rd edition) is the best source for problems in general, though not all of them are relevant. 
Another set of good books are Cohen- Tannoudji's “Quantum Mechanics,” volumes 1 and 2, which have a lot of worked 


exercises — every problem is done very slowly, but it's very well explained. 
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We'll spend some time today putting together the ideas about two-state systems, being a bit more direct and to 
the point. 
We should always remember that two-state systems have two basis states, not two states in general! A good way 


to summarize a system like this is with a magnetic dipole moment 
w=YS, 


where ‘¥ is positive for a positive charge and negative for a negative charge — for example, Z = — 2-5 for an electron 
(where we're using Gaussian units — the c in the denominator is a matter of convention). The Hamiltonian for such a 
system looks like 

H=-f—-B=-7B-S, 
where B is the magnetic field that the dipole moment is in. Further simplifying for our purposes in the case where the 
field B is constant, we can write B = Bf, where B = |B| is the (nonnegative) magnitude of our magnetic field — this 
gives us yet another expression 


H=-—yBi-S. 


We can then write down the unitary time-evolution operator: 


eiHt/n oe (-YBt)(#S)/h 


To understand how this affects our states, recall that there is a rotation operator 


Ri(a) _ evar S/n 


parameterized by a rotation axis 7 and an angle a — we rotate counterclockwise (with the right-hand rule) around 7 


with angle a. But these last boxed expressions can be identified with each other: we can use the same fs, and our 


angle of rotation is now a = —yBt. a is a number — it doesn't have a direction — and thus we can describe our states 
as rotating around fA with angular velocity 
w= ai — Bn 
Rewriting Bras the vector B, this is the Larmor frequency @, = —7B, and we can use this to rewrite our Hamiltonian 
as 
H=6i,-S|. 


To make this more explicit, we can consider the most general Hamiltonian for a two-state system 


H = gol + S— gio = gol + 9-6. 
i 
where go, 91, 92, g3 in general can be time-independent. Then the Larmor frequency can be written by replacing & with 


2S: this tells us that 


a 
we = 59 


for our general-form Hamiltonian. Similarly to the magnetic field, we can write g = gf, where g is the magnitude of 


g: remember that the operator 7-@ has energy eigenstates |n; +), so the Hamiltonian 
H = gol + g(fi- G) 


will have energy eigenstates of go + g and go — g, corresponding to |n;+) and |n;—) respectively. (Here, we should 
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think of the go/ term as not really doing anything, except shifting all of the energies. We do have to be careful, 
though: adding a go/ term to a Hamiltonian will add a phase e/“‘ to our wavefunction. This is not something we can 
observe if the gg/ term shows up in the whole system, but it can be relevant if this is only a subsystem!) 


We'll spend some time trying some exercises now: 


Problem 231 


Recall the expression for our coherent state 


la) = eT !2l?/2 gaat ig) | 


Use this expression to calculate the overlap (Ga). 


We have that 
(Blox) = (0| (1617/2 "yt er lal?/2 eat |Qy . 


Since e!@!? and e7'6l° are both real numbers, we can pull them out of the bra-ket expression (the dagger doesn’t 


0). 


ye = (ola'(atyo) 


affect the e/6I°/2 term), leaving 
— e-lal?/2-16"1/2 (ole e%2 


We can now expand the exponentials as 


i} 


and then we can put a factor of Vi! and \/jl into the denominators of the bra-ket so that we get orthonormal energy 


eigenstates: thus this expansion will yield 


yp Oy ow 
a 


Working a bit more will yield an answer of | e7!@°/2-1671/2+8"e | 


Problem 232 


Suppose we have a two-body Hamiltonian H = H; @/ + /@ Ho. Show that we can write the time-evolution 


iHt/h 


operator e7 can be written as a tensor product. 


Note that a Hamiltonian can’t really look like H; @® H2, because that would have units of squared energy. So the 
above expression is actually the natural Hamiltonian for two particles that don’t talk to each other! 


The idea is to first plug in directly, yielding 
en n(n @I)— 5 (I@H2) 


The operators in the exponent here do commute: they act on different worlds, so we can rewrite this as a product of 


exponentials 


— eo RCMB) 9 F (I@H2) 


This is an ordinary product — both expressions are operators on the tensor product space, so we should not take their 


tensor product. We can then rewrite this as 


(err @ ') ; (J @ ee 
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and now we can multiply the two operators by composition to yield (eve ® etn) . This is a fundamental 


result: we just tensor product the time-evolution operators for the two spaces! (It’s good to make sure we can 


understand the logic here — that means we're on our way to understanding tensor product spaces.) 


29 Multiparticle States, Part 2 


Last lecture, we started discussing the singlet state 
oil 
V2 


This state has a few interesting properties: its total angular momentum is zero (in the x, y, and z directions), so it is 


Ip) 


Uda de) eee) 


a rotationally invariant state. It is also an entangled state (which we used when discussing quantum teleportation), 


and it isn’t hard to realize physically. 


Example 233 


Particles can decay in such an entangled state physically: for example, a meson called an 7 Is an interacting 


particle, which decays into a w* and a ps particle. 


Since jo has zero angular momentum (it doesn’t spin), conservation of angular momentum tells us that the w* 
particles will be in a state like |q), as long as there is no orbital angular momentum. So it’s pretty easy to create such 
an entangled state! 


We have also showed that 


Lt ens = = Sy 
lp) = il: +)1 | —)o — | —) | +)2) 
for any direction 7 because of rotational invariance. We can use this to talk about probability: 


Definition 234 
Let P(a+, b+) be the probability that we find the first particle to be in state |a; +) and the second particle to be 


in state B: +) when we measure the singlet state along the 2, B directions respectively. 


Calculating such a probability is nontrivial, but we can use the fact that our state |q) is rotationally invariant. 


Picking our normal vector to be a, we know that 


1 
V2 


Ip) = 


(ai)a 4 la lara ee) | 


P(a+, b+) = 


(ato (B +| (Fee bt —)o — lB) IF Ha))]: 


We can evaluate each of these terms — remember that we evaluate the inner product for a tensor product by doing the 


inner products in the individual spaces. The second term drops out, because (a +]|&;—) is zero, and this just leaves 


us with 


P(a+, b+) = = (Bi tar y= 3 (6 +a ie 
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To finish simplifying this, we can calculate the overlap between two spin states 7 and 7’: recall from early on in the 


class that this is cos* 2, 


should use the —& vector instead, meaning that our angle is 1 — @,p instead of 035 (the angle between the two vectors). 


where ‘y is the angle between the two spin states. Since we have a minus sign above, we 


Thus, our final answer is 


4. 1 1 1.2.40 
P(a+, b+) = 5 cos? 5(m fs) = 5 sin? = | 
For example, if the second particle is being measured along b= — (they point in completely opposite directions), then 


the overlap should be s, because we can look at the boxed expression for |) above: the first term, |3:+), |%—)>, 


corresponds to a and b both being positive, while the second term corresponds to them both being negative. And 
indeed, 13, = 7 in this case, and our probability is 5. 

Another interesting case is to consider P(2+,X+): these two vectors have 735 = 5, so the probability that they 
1 
a | 
With this, we can discuss the EPR paradox — we might have seen this in 8.04, but now we have the mathematics 


are both measured to be positive is 5 sin? t= 


to appreciate it more completely. And this will lead to the Bell inequalities soon after. 


Fact 235 


The EPR story began when Einstein, Podolsky, and Rosen wrote a paper about local realism. 


This sounds like philosophy, and people thought the question was undecidable for a while. While it’s difficult 
to pin down the actual definition of local realism, one main idea regards two assumptions that we make about 


measurement results: 


« When we measure something and get a number, this measurement corresponds to “some aspect of reality.” In 


other words, there is something real about our object. 


- Measurements that we do (for example, in a lab) are not affected by measurements or other actions that are 
done far away (for instance, on the moon) at the same time, because there's no time for the information to 


propagate between the two actions. 


Einstein was very vocal about insisting that physics must satisfy both of these assumptions — while he was correct 
and insightful about the photoelectric effect and relativity, he was unfortunately wrong in this case. 

It seems very reasonable that the first assumption would be true — Einstein would perhaps say that a spin up particle 
is always spin up, and we discover that fact through measurement. Then a way to get around us not knowing whether 
a particle exits the Stern-Gerlach machine spin up or down is to try using hidden variables: perhaps there are some 
properties of our particles that we just don’t know, but if we knew those properties, we could predict the result of our 
experiment. This might sound like an untestable hypothesis, but it isn’t — we'll see this soon! 

And the second assumption breaking seems even more disturbing — we've gotten used to the idea that simultaneous 
events cannot affect each other because light cannot be exchanged between them. It then seems that we could send 
information faster than light if this assumption were false — people have discussed many questions here, and it’s worth 
thinking about. But it turns out at the end of the day that there isn’t a way to get real information faster than the 
speed of light. 

So let's now review the EPR thought experiments and try to see how they relate to the two assumptions we're 


trying to make. 
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Example 236 


Suppose Alice and Bob are measuring states, both along the z-axis, in such a way that if Alice measures spin up, 


Bob measures spin down and vice versa. 


This is some kind of a correlation between Alice and Bob’s experiments, and it seems like we'd know information 
from Alice’s experiment about Bob’s experiment. But EPR claims that when we do this experiment, we've already 
created entangled particles with definite spin vectors. Specifically, the claim would be that 50 percent of Alice's 
particles are definitely spin up (so Bob’s corresponding particles are spin down), and the other 50 percent of her 
particles are spin down. This does indeed give us correlation, and what EPR says is that there is no quantum 
superposition there! 

Mathematically, there isn’t a problem here — we're just claiming that the (definite) spins depend on some hidden 


variables that we don’t know. So let’s look at a more complicated example: 


Example 237 


Suppose Alice and Bob each have two Stern-Gerlach machines, one in the z and one in the x-direction. 


Einstein would say that in such an example, we shouldn’t talk about making one measurement after the other: we 
can measure either in z or in x, and there will be a definite answer for each particle’s spin. Let’s say for example that 
we have a particle (2+, X—) — in other words, if we measure the z-spin, we get +, and if we measure the x-spin, we 
get —. So EPR is saying here that the particles look like this instead of a strange superposition of |+) and |—): there 
is some reality, and we measure that reality for each particle. 

So suppose we have entangled particles for Alice and Bob: say that Alice's particle is (2+, x+). Then Bob's 
corresponding entangled particles must look like (2—;X—). Indeed, we'd find that we would have correlation if we 
measure in either the z- or the x-direction. Similarly, if Alice’s particle is (2—, +), Bob’s would be (24+, X—), and 
we can also produce pairs of particles with (2+,—) and (2—, +) or with (2—, —) and (2+,X+). There are four 
different possibilities here, and what EPR is saying is that 25 percent of the entangled pairs that are formed are of 
each type. 


And now we can ask EPR some questions: for example, the probability 
P(z+a, Z—B) 


(which is the probability Alice measures + and Bob measures —) is 50 percent, because there are two of the four 
cases which correspond to this reality. (And this is the same prediction that we would make from the entangled state 


formulation.) Similarly, we can ask for the probability 
P(2+ aXe), 


and this time there is only one of the four cases that works: thus the probability is 25 percent. Again, this matches 
our quantum results, and it seems like everything is consistent. 
So everything so far has not required quantum mechanics at all: it wasn’t until Bell that we tried three directions 


and made a breakthrough towards disproving the EPR theory! 


Example 238 


Now Alice and Bob have three Stern-Gerlach machines in directions a,b,c, and our particles now need to be 


labeled with a + or — label for each of x, y, z. 
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So an example of a label for a particle would be (a+, b—, c+): in other words, measuring in the a-direction yields 
an spin of ue measuring in the b-direction yields —2, and measuring in the c-direction yields A So we're doing a single 
measurement here (not doing anything with simultaneity), and we'll always ask for probabilities of events like “Alice 
measures a+ and Bob measures c+.” 


Let's quickly list the different possibilities for our entangled particles: 


Particle 1 (Alice) | Particle 2 (Bob) 
(at, b+, c+) (a—, b—, c—) 
(a+, b+, c—) (a—, b—, c+) 
(a+, b—, c+) (a—, b+, c—) 
(a+, b—, c—) (a—, b+, c+) 


(a—, b+, c+) (a+, b—, c—) 
(a—, b+, c—) (a+, b—, c+) 
(a—, b—, c+) (a+, b+, c—) 
(a—, b—, c—) (a+, b+, c+) 


(The particles that Alice and Bob have always have different measurements along each of a, b, c — that’s the way 
that they're correlated.) It might seem like we want to put 5 of the particles in each of these states, but the argument 
that we'll be making here doesn’t require this: let’s say that there are N total particle pairs in our system, and there 
are Ny, No,--- , Ng particle pairs in the eight states of our table above. 

What we're going to do is run into a contradiction: we need to make our quantum mechanical formula $ sin? ab 
go wrong in this model with all of the different measurements we can try. The idea is to try to combine the three 


directions into a single equation. First of all, 


N3 + Ng 
P(at+, b+) = : 
(at, b+) = = 
because only the third and fourth cases have the first particle in the a+ state and the second particle in the b+ state. 
Similarly, 
rah, x= No + Ng 
¥ —— N ? 
and ee 
3 + N7 
P b+) = . 
(c+, b+) = == 


Now we can make a silly-looking inequality: because N3 + N4 < N3 + N7 + N4+ No, we can divide that by N, and we 


now know that under the assumption of local realism, we have 


P(a+, b+) < P(at, c+) + P(c+4, b+) |. 


This is Bell’s inequality — we've turned an assumption of realism into a mathematical fact! We didn’t write down 
specific probabilities here, but what's interesting is that we can pick any populations that we want here, and we can 
try to use any a, b,c to get a contradiction. But we know that quantum mechanics has a formula for each of these 


expressions in Bell's inequality! If quantum mechanics is true, then the left hand side is equal to 


1 id 
P(at, b+) = 5 sin” oe 


while the right hand side is 


1 6 ) 
P(at+,ct+)4 P(c+, b+) = 5 (si? AC 4 sin? sc). 


And it’s actually pretty easy to find vectors a, b, c such that the left hand side here is larger than the right hand side: 
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put them all in the same plane with c between a and b, such that there is an angle @ between a and c, as well as 


between c and b. Then local realism claims that 


I ed 25 6 20 
5 sin @ < 5 sin’ 5-2 = sin? 5. 


And if we make @ sufficiently small, this inequality is not satisfied: the left side is approximately S. while the right 

hand side is approximately In general, this inequality fails for any @ < 5! So we now have a measurement that we 

can do in quantum mechanics with correlated entangled particles, and this actually contradicts local realism. 
Therefore, what this tells us is that there’s no way to use hidden variables to get around the issues of quantum 


mechanics — local realism is incorrect. 


Fact 239 


And Alain Aspect and others did the physical experiments in the 1980s, and this confirmed that Bell's inequality 


is indeed violated. 


We'll finish this lecture by discussing angular momentum and an elegant vector notation that will help us under- 
stand this better. Let’s summarize some of the things that we already know about it! The orbital angular momentum 


operators are defined to be 


In many cases, it’s better to use labels %,, 8, X3 instead of X,Y, 2 (and analogously for 6). That's because we can 
write commutation relations like 
[X, Bj = iNdi;, [X, Xj] = [Bi. Bil =0. 
But the main idea we want to explore here is using vector notation. There are two ways to do this — we can construct 
triplets of objects, which are vectors, or we can form the vectors ourselves. This second option often leads to objects 
that are a bit confusing, but we try our best to avoid this. For example, instead of thinking of the F operator as 
(X, 7, 2), we'll write 
P= K€, + VEo + 263 = KE, + Koo + &E3. 

We should understand that the basis vectors é are useful for writing expressions instead of triplets, but that they're 


not really interacting with the X, 7, 2 operators. Similarly, we can define a momentum operator 


as well as the angular momentum operator 
L=146,+1064+ 136. 


These vectors are unusual, because their components are vectors, not numbers. So we need to understand how 
these operator vectors can be modified — we'll define the dot product and cross product like we usually do, though 


we need to be careful not to make mistakes. 
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Definition 240 


Let 3, b be vector operators. The dot product of the two operators is defined to be 


a: b= Se ac: 


Similarly, the cross operator of the two operators is a vector operator defined via 


(ax b); = Ejjk aj Dk. 


The order matters here — a and b are operators, so they might not commute. 


Definition 241 


For a vector-valued operator, define a2 = 7- 7= Yo aaj. 


We can now start doing some calculations. For example, 
a-bAB-a, 
because our vectors now have non-commuting operators: specifically, we know that 
a. b = ajbj = [a, bi] + bai, 


where we're using the repeated index convention. But bja; is b- a, so we get a formula 


a-b=b- a+ [a;, bi] | 


Example 242 
Plugging in the operators r, 6, we find that 


(We should remember that the commutator [x;, pj] is summed over /, so we pick up a factor of 3.) This means 


that the dot product is no longer symmetric, and similarly the cross product is no longer antisymmetric! Indeed, 
(ax b); = E {jk aj Dx = Eijx([ay, bx] + byaj). 


The first term here stays put, and we can swap the indices / and k in the €;;, on the second term, picking up a minus 


sign: thus 


(ax ); | = €ixjbeaj — €ijelaj, bk] =| —(b x 2); + eijelaj, be] | 


In other words, the cross product will no longer be antisymmetric unless we're lucky. 


Example 243 


If we try to compute rx r, we can plug it into our identity above to find that 


<a Patsipa by eta| = Ir r+o0, 


so we do have Fx F=0. 
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We can similarly find that p x p = 0, but something like L x L will not be zero! This is because [Zj. Lx] iS nonzero, 
and we'll end up finding that 
Ex = ial. 
On the other hand, if we try computing 
(7 x p)i = —(P x P)i + €ijn[%, Bu] = —(B x Pi + € ijn dj. 


the last term is zero. It’s true because 6 requires / and k to be the same while € requires them to be different, but 
a more general principle is that multiplying an antisymmetric and a symmetric object will yield zero! (To show that, 
we can just relabel j and k by swapping: then we get the same quantity with a negative sign.) Therefore, we actually 


have 


(Fx p)i = —-(BX F)i => [Fx P=—PXxF], 


and this is the object that we call the angular momentum. 


Example 244 


We know that the angular momentum is classically perpendicular to both 7 and pf — is this true in the quantum 


mechanical case? 


We'll first compute 7: L. By definition, we know that 


and now the operators & and & commute, but €jj, is antisymmetric, so the whole expression will collapse to 0. 
Therefore, 7- L is indeed 0. 


On the other hand, let’s find @- L. There's two ways to do this problem, and we'll do it by writing out the indices: 
pL = pie ijn&dx. 


There is a temptation to say that the operator part of this is symmetric in / and k, because there are two operators 
p; and 6,, but this is incorrect! We have to move the / operators together, and there’s an X operator in the middle 


that might screw things up. So we'll be a bit more careful: this evaluates to 
Ejjk DiXjPk = Eijk Xj Bi Pk 


because the commutator of 6; and x; vanishes, and now it is okay to say that €;j, is antisymmetric in / and k, while 


the rest of the expression is symmetric in / and k: thus everything vanishes and we're left with zero again. 
Remark 245. Note that we could have also used that L = —p x F, which would have simplified things a lot.) 


So now we know that 7-L = p-L = 0, and doing a few analogous calculations allows us to find that L-7 = L- = 0 


as well. 


30 April 8, 2020 


Concepts of multiparticle states and tensor products have been coming at us pretty quickly! We'll try to talk about 


some of the important ideas. 
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The idea of a tensor product is both physical and mathematical — there's lots of physical ideas that are reflected 
in the way we construct the mathematical axioms. Recall that we are trying to create a new space V ® W from two 
vector spaces V, W, and the initial idea is that we want to just write down ordered pairs (v, w), where v € V and 
w € W, and make these the objects of our new vector space. But on its own, this doesn’t give us very much insight, 
and it doesn’t actually reflect the physics that we care about here — here, we get what is called the direct product 
instead. 

One of the main problems is the multiplicative structure of this new vector space: in a direct product, the vectors 
(v, w) and (av, w) are linearly independent. But in quantum mechanics, we have a single wavefunction for any system 
— even if we have two or three particles, there's still just one wavefunction. So picking up multiplicative factors of a 
independently in the v- and w-entries means we're constructing “separate” wavefunctions for v and w, which we don’t 
like. 

So that motivates us to say that 


a(v, w) = (av, w) = (v, aw) 


for any complex number a (there’s no complex conjugation here — we treat V and W the same, so this isn’t like having 
a dual space). And now we introduce the notation v @ w instead of (v, w) to emphasize that we're putting the vectors 
in V and W together with a kind of structure. 


And once we introduce linearity in the form 
(y+ v2) @w=Vy@w+wew, 


V@ (w+ Ww) =Vv@w,+Vv@ wo, 


we have a rigorous definition of the tensor product space! In general, if v ® w is an element of V @ W, then we can 
have objects of the form 
» vy, @w€V@W. 
i 


Fact 246 


In relativity, there are objects called tensors (which have indices and transform in specific ways). They have some 


relations with tensor products, but it’s not very immediate — the tensor products we're writing here represent 


objects with two indices. 


To be more specific, we can form a basis for our tensor product space V ® W of the form {e; © f}, where the e; 


are basis elements of V and the fy, are basis elements of W. So a general vector in this space will look like 
S- h!(e; ® a 
ij 


and now the object hY can be thought of as a two-index tensor. (The position of the indices being “up” instead of 
“down” is more important when we discuss transformations, though it’s not important for our purposes here.) And if 
we rotate the basis vectors, that will affect the components of h”, and then we'll need to think a bit more about how 
tensors transform. 


Recall from last lecture that we defined the operator 


Hr =H, @14+1@ Ao, 
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and we found that we could write the time-evolution operator 


eo ithr/h = eT /Ait/h ® eo iHet/h 


Let's consider some wavefunction in our tensor product space: naively, it might be of the form 1 @ wo. (Practically, 
we end up dropping the tensor symbol.) If we apply the time-evolution operator, it seems like the first factor acts on 
the first wavefunction, and the second factor acts on the second wavefunction. But this isn’t quite precise because of 


entanglement! A general wavefunction might look like 
Soo @ v4, 
i 


constructed in such a way that we might not be able to rewrite it as W 1 ® Wo at all. So it doesn’t make sense for an 
operator to act on the individual components of the tensor product, even when we have a Hamiltonian which is acting 
separately on the two parts. 

And tensor products are often used to combine different properties of a single particle: for example, an electron 


Cy 


with a position wavefunction might also be a spin 1/2 particle, so we'd have to deal with terms like w(x) @ 


C2 
It's good to remember that Hamiltonians on this tensor product space might make the spin and the position function 


interact in complicated ways! 
Let’s take a moment now to look at Bell’s inequality, statistical mixtures, and EPR. Everything starts when we 


start with our singlet spin state 


(y=) 


This is going to show up when we study angular momentum soon: what's important about it is that the total angular 


momenta Sy, Sy, Sz are all zero (this requires a bit of computation), which means that we can write it as 


\w) ) |: —) — |; —) |; +)) 


; (77,4 
V2 
for any direction 7 — since we're getting the same state in any direction, it’s a rotationally invariant state! So it’s very 


nice to work with and analyze, and now let’s turn to the quantity 
P(a, b). 


To explain what this means, suppose Alice and Bob have the two particles of the singlet state, and Alice measures along 
@ while Bob measures along b. Then we're defining that probability above to be the probability that Alice measures a 


state along +d (instead of —a), and Bob measures a state along +5 (instead of —b). We've derived the value of this 


. 1D cuted) ; = > 
before — it’s 5 sin? = , where 62» is the angle between the vectors @ and b. 


For example, if the angle is 180°, so that Alice and Bob measure along % and —@ respectively (which is basically 


measuring along the same axis), the two particles will always be in opposite directions, so either they will both measure 
(+) or they will both measure (—) (because their orientations are different). As a different example, if Alice and Bob 
measure along the X and 2 directions, we have that P(x, 2) = 3 sin? 2 = i In this case, the measurements are 
essentially independent for Alice and Bob. 

The statistical mixtures idea from Einstein is basically claiming that we don't actually need any of this probability 
idea: when we have a bunch of entangled states of this singlet state form, EPR claims that the results that we see 
are already inherent in the particles before measurement. For example, a (2, —X) particle has attributes that are 


deterministically prepared, and whenever we put it into a z-direction Stern-Gerlach machine, it will always come out 
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with a +2. 

To make this consistent with an equation like P(X, 2) = ; (which is experimentally verified), we need to give 
a different explanation than quantum mechanics does: instead, we say that one-quarter of the particle pairs in our 
ensemble of entangled states have particle 1 in the (2, X) direction and particle 2 in the (—2, —x) direction. And this 
is consistent with the fact that whenever we measure the two particles along some given axis, they are in opposite 
directions (the signs are correlated). And we'll also need to have another quarter of our particle pairs have particle 
1 in the [2, —X] state and particle 2 in the [—2, X] state, and so on. If we set up such an ensemble, this EPR model 
is indeed consistent with our quantum mechanical observations: there’s only one of these four groups such that Alice 
would measure —z and Bob would measure +x, and so on. 

The EPR model holds up well whenever we look in two dimensions: we can set up an ensemble of particle pairs 
along any two directions 2, b. It’s not until we introduce a third dimension that the problem comes up! 


we'll finish with an idea regarding operators on a tensor product space: we claim that 
LIU @V) = L(V) @ L(V). 


This will require a lot of thinking if we haven't seen a tensor product space before. One point that might be puzzling: 


if dim U = dim V, we can consider the swap operator 
S(u@v) =v@u. 


If the two vector spaces have the same dimension, this is a valid operator. But how can an operator in £(U) ® L(V) 


swap vectors between the vector spaces? 


31 Angular Momentum, Part 1 
Last time, we introduced the quantity of angular momentum, which we showed could be written as 
L=rxp=-px?. 


When we work with angular momentum, we often think about how a vector behaves with rotations. 


Definition 247 


A vector operator i is a vector under rotation if 


We've verified that fF and pare indeed vectors under rotation in our homework. This gives us an important theorem: 


Theorem 248 


If @, V are vectors under rotations, then - V is a scalar and & x V is a vector, both under rotation. 


Recall here that 7: V being a scalar means that 


This, for example, shows that if we plug in u, v = For p, we have that 
(Carl =P lS (ree So. 
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We can also plug in for p into the equation 
[Li, (u x v)j] = ihe ijn(u x V)k. 
For example, we can plug in u = F, v = p (to yield the vector L), and we'll find that 
(Li, cj] = ineijn lg. 


This time, we didn’t need to verify the complicated calculations by moving the xs and ps past each other — we just 


used the theorem above! And now that L is a vector under rotations, we know that 


[ial )=0 


(because [-Lisa scalar). This last property is very important, and we can indeed check that it works by using the 


algebra directly (we're encouraged to try this out ourselves). 
Remark 249. We know that the spins have basically the same algebra: 
[S;, Sj] = ihe ijn Sk, 
and in that case S? is just S2 + S, + $2 =3 as !. So in that case, it was clear that S; always commutes with S?. 


The point is that whenever we look at any kind of angular momentum, we'll use a generic name J. We'll always 
have the algebra of angular momentum 
[i, J] = ihe ink, 


and that will let us extract the properties of this algebra alone, rather than the specific physics of the system — we'll 
always have 
[i, P] = (i, H+ B+ 8] =0. 


This algebra also tells us (in fact equivalently) that 
Jx J= inJ, 


and in fact a vector u being a vector under rotation tells us that 


Jxi+ix S=2ini|, 


(To derive this, we just expand out the left hand side using index notation.) 

Now that we've established some basic identities for vectors under rotations, let's move on to the question of 
computing 

(7% By (ax By 
We have classical formulas for this, but there are some correction terms: it’s not ab? — (a: b)2, which is the quantity 
that we get if we do things classically. To understand what the extra terms are, we'll look at the special case of 
ax b=Frx p: indeed, we have 
P= Pp = (Pr: py + ihr. p. 
We can then solve for fp? to find 
f= 2 ((7- p)? — in(F- p) + aa 


(Note here that we've multiplied s by having it multiplied on the left instead of the right.) But now remember that 
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p= AY, which means that 7- p = aye (since we plug in fF = rf). We can then simplify the first term on the right 


hand side above, and we end up with 


1 fe) 1 = 
a2 2 2 


(We can verify that this is the correct operator by it to a test function f(r)). But we also know that this function 
p= —Rrvy? 
is the Laplacian, which we can expand out as 
Py cle 1 LY 308,220 ie OF 
=—-fi r- - sin @ bis i 
r Ore r2 \sin@ 00 06 sin 6 O¢2 


Comparing these two expressions for p* gives us an explicit formula for L?: we have the scalar operator 


a eee ee ee 
ae (sam 5 sin? 6 Og? J | 


This means L2 only depends on the angular variables, and this makes sense intuitively — It’s a rotation, so it shouldn't 


change fr, and we can say that it acts on functions on the unit sphere! And this isn’t something that we can easily 
find by direct computation: we would have had to write this out in terms of [2 Le. Le and then subsequently write 
this in terms of X and 6 and simplify to angular variables. 

What’s important is that this gives us an understanding of the Hamiltonian for a central potential 


p 
has + MO) 


(here the potential only depends on r = |r|). We know the expression for p*, and plugging it in here gives us 


fh? 1 6? 1 o 
H= [ce 
2mr are" 2Qmr2 (r) 


This will be the starting point for helping us write the Schrodinger equation for our central potentials — it does indeed 


depend on our operator [?. 


Remark 250. No parentheses in an operator means that an operator acts on everything to its right. For example, 


fe) fe) 


ap = ap 


(by using it on a test function f(r)). 


We can now discuss the concept of a set of commuting observables. Forming such a set helps us understand the 
physics attached to a particular Hamiltonian: the first thing in the list should be the Hamiltonian H itself, since 
we do care about the energy of our states. 

Now, we know that the &,, X%, X3 operators commute with each other, but they don’t commute with the Hamiltonian 
— there's a p* term. Similarly, we can’t use f1, fo, 63, because there’s an x-dependence in the potential and there’s no 


reason in general for this to commute. Similarly 7? or p? 


or r- pare bad, but the operator 7 x p is interesting: let's 
try using the operators 

fy, Lo, Ls. 
We can check that the angular momentum commutes with the Hamiltonian: remember that the Lis commute with 
p° from the discussion above, and V(r) is a function of r = |7]?. So anything that is a function of r must commute 


with all of the £;s, so our Hamiltonian commutes with the angular momentum operators. This is then an angular 
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momentum conservation statement, because we know that 


So now we want to add [4 i iz to our list of observables, but they don’t commute with each other! We can only 


add one, and the convention is to use [x . Finally, we can add the operator [2 |: this indeed commutes with all of 
the Ls. 


Proposition 251 


The universal set of commuting observables for a central potential is 


(Avie 


And we can always add funny observables to this set, like spin, if that’s a property of the particles themselves. 
(We'll see that there are many states with the same ig but different total angular momenta, so we do indeed need [2 
to describe our system. But it’s important to note that we can’t actually measure different components of the angular 
momentum at once, because the operators don't commute!) 

We want to learn about the kind of states that can exist in a system with the action of operators that behave like 
J;, which are also Hermitian. We'll be able to derive powerful results, even in the case where systems have nothing 


to do with angular momentum. The first step is to introduce the operators 


Js. = Ji Sie, Jo, 


and note that 


Jy S| = (dy + 1g) (Ay — 1a) = JB + JB + ify, Jy] =| #2 + JB + fg |. 


Similarly, we have that 


JJ, = B+ BB fds |, 


and we can use these to find the commutator 


[tit (Reh) =F HH oO, 


as well as 
P=aR+ B+ BK =I + 8 - fis. 


These kinds of identities are pretty simple — we're deciding that we like J, and J_- more than J; and Jo, and we're 


trying to figure out everything we can about them. Two other nice results we can find are 
[J3, Ja] = [J3, Jy + ido] = hog + Ad, = fd, 


and similarly 
[J3, J_] = —AL. 


This should look similar to the harmonic oscillator commutator 
[N.a']=al, [N,al=<=a. 


(Here, we've used the fact that our operators J), Jo are Hermitian, so J; and J_ are actually adjoints of each other.) 


In the harmonic oscillator case, a? increased the number eigenvalue of N, and a decreased it — we'll see something 
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similar in our new system — J will increase the z-component of angular momentum, and J_ will decrease it. 

Here's where we need to make a physical declaration: there exist states in this setup that we've been creating. In 
the harmonic oscillator case, we create infinitely many states above the ground state — this is connected to the idea 
that the operators & and 6 cannot be represented with finite-dimensional matrices. But in the angular momentum 
case, we're actually going to find that there are finite-dimensional matrix representations! 


So our set of commuting Hermitian operators contains (replacing J3 with J, now) 
ean 


Since these are Hermitian commuting operators, they are simultaneously diagonalizable, and we're saying that there 
are states that represent this diagonalization: our vector space should contain a list of orthogonal vectors that are 


eigenstates of both operators, and in fact we can make an orthonormal basis for the whole vector space. 


Definition 252 


Define the orthonormal basis states |j, m) such that 


Pijem)=PjG+ lim), Lli,m)=hmlj,m), 


where j/, MER. 


It seems reasonable at first to put 
F |j,m) =? lism), Slim) = hm, m). 


But this isn’t very convenient — we'll see later why our definition makes the algebra work out better. And we'll see 
soon also that J, m will get quantized. 
To understand our definition a little more, we can evaluate 


Gi, m| |, m) = PG +1). 


(We're sort of assuming that our states will be quantized so we don’t need a delta function normalization factor.) But 


we also know that 


Gi, m| Pj) = 32 G ml didi im) = So |i m)II? > 0, 
i i 
where we've used the fact that J; is Hermitian, so by definition we must have 
JUG+1)>0, 


which means that we can label (parameterize) our states uniquely by either restricting our domain to j > 0 or j < —1. 
The next step is to understand how J, and J_ act on these states: first, note that J, and J_ commute with ie 
because J;, Jo, J3 commute with J? and we just have linear combinations of them, which means that J,,J_ do not 


change the eigenvalue of /* for a given state: 


P?( Ja Lim) = Ja (F Li, m)) = PIG + DJs Lim). 


So J. |j, m) is also a state with the same (eigen)value of J*, which means it must correspond to the same value of /: 


Jz |j,m) ~ lj, m7). 
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So we want to see how J+ affect m, and here’s where we have a bit of a calculation. Introducing a J, into the 


expression, 
JzJ4. |j, m) = ([Jz, Je] + Jed) Um), 


and we've calculated the commutator before, and we can let J, act on the state, to find that 


= (+h + fimJ4) lj, m) = h(m paid 1) Je lf, m) : 


Therefore, the operator J, acts on J |/,m) to get an eigenvalue of fi(m +1), which means by definition of J, that 


our state satisfies 


Js |j,m) = cz, m) |i, m+ 1) 


for some constant of proportionality ci(j, ™) to be determined. (We can label our states so that the js line up.) To 


find these constants, we take the dagger of the equation above: 


GmiJe =ciU,m)Yim+1), 


and now putting these together to yield an inner product tells us that 


(j, ml JeJaL,m) = |e, m))? «1. 


The left hand side can be calculated by using the formulas we've derived before: 


Ice, m)?/? = Uj, ml? — 8 |i, m) = FU + 1) — (Mm? £m). 


We've now found our constants, and taking the square root yields 


+(,m) = hj +1) — m(m + 1) 


ry 


(we can ignore the extra phase terms, since they don’t do anything physically). And this is the reason why we use 


JU +1) —it makes it easier to compare ms and js — and now what’s important is that this quantity jj +1) — m(m+1) 
must be nonnegative, so that ||J, |/, m) ||? is a nonnegative number. This means 
JG+1)-—m(m+1)>0 = m(m+1)<jG +1). 


The right hand side of this is some nonnegative number, and the left hand side is a quadratic function of m. We have 


equality when m = / but also when m = —j — 1, so the required condition is that m must be between the two points: 
—j-l<m<y. 
Similarly, the states J_ |j, m) must also have nonnegative norm, which means that 
jJUG+1)-—m(m—1)>0. 
An analogous argument tells us that we must have 
—Jems<jt+l, 


and now both inequalities must hold: 


Spel sy, 


But we can say a little more than that: J, is supposed to increase m, so we run into trouble at some point. (Intuitively, 
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we should think of / as the “length” of the J? vector, and m is the J, component.) Indeed, when m = j, we have that 
JU +1) — m(m-+1) = 0, which means that our state vanishes completely beyond that point! The same thing is true 
for m = —/ — we can't get to smaller values of m. 

And now, we can think of this as having a ladder of states from m € [—j, /], where J, and J_ increase or decrease 
m by 1. Any state in our system created in this way must terminate at —/ or / on the ends for consistency, which in 


particular means that the distance / — (—/) = 2/ must be an integer. And now we've gotten the discretization of our 


1 


states — this indeed tells us that our particles can have spin or angular momentum of 0, 50 ds and so on! And now 


we've arrived at the main result of angular momentum: 


Theorem 253 


The values of the angular momentum can be s = 0, 5, al 3, --+, with a total of 1,2,3,4,--- possible values for m, 


respectively. 


states for / = 1 are |1,—1),|1,0),|1,1), and so on. 
The punchline of this is that we were working with an infinite-dimensional vector space, but it breaks down into 
states of half-integer /, and we need to figure out which values of / are actually possible. Central potentials will have 


0,1, 2,4, spins will have 5, and so on. 


Example 254 
t 


Consider a two-dimensional simple harmonic oscillator, where we have ax, ay, al, ay as our operators. 


This may seem strange — we have a two-dimensional oscillator even though we've been talking about three- 
dimensional angular momentum. But we're going to get an angular momentum that pops out here — it’s abstract, but 
it has important properties! 

We can start by looking at the spectrum: we have the ground state |0), the first excited states a! |0) and al |0), 
the second excited states ala! |0), alal, |0), and alal |0), and so on. In general, there are (n+ 1) states in the nth 
excited level 


(a)"10),, Car ral lO), ++». (ap)? IO). 


And this actually relates to having 1,2, 3,4,--- states in the varying levels of / — let's see how that plays out. We can 


start by introducing the operators 
arg = ae lay) rete + jay) 
V2 . a 2." ie 


as well as the number operators 


Nr = aap, No = al ay. 


These new “left” and “right” operators don’t mix, and now we can rewrite our excited states: we have the ground state 
|0), the first excited states al |0) and al |0), the second excited states aay |0), dal |0), and ala) |0), and so on. 
(This is completely analogous to the result above.) But remember that we can compute the angular momentum in 
the z-direction 

£, = Xpy — YBx = h(Nr — Ni), 
and now we can see what values of L, we have here. The ground state |0) has i, = 0, the first excited states 


have fi and —fi respectively, the second excited states have 2h, 0, —2h respectively, and so on. This isn't exactly the 
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correct values of m that we derived earlier, but we can turn to another aspect of this theory: remember that J, keeps 
increasing the angular momentum until we annihilate our state |j,/), so we should see if something similar happens 


here. So our corresponding J, operator should actually be 
b= By ai 

which kills the top state (al) |0) for any n, and then the corresponding dagger operator is 
LL. = al ar. 


Indeed, this kills the bottom state (aly |0) for any n, and now we have all of the important parts except for one 
conceptual step: there is no angular momentum in the two-dimensional plane, so we'll instead introduce an abstract 
angular momentum 

= 7 (Ne —N). 


And now things seem to fit: the magnitude of of J, is 0 on our ground state, —% and f on the first excited states, 


and —h, 0, fi on the second excited states. This now means we can introduce our other angular momentum operators 
Jp= Babar, Jo= Bal ar, 


such that we have J,, Jy, Jz satisfying the algebra of angular momentum. Once we verify that we can find such a 
B, our states do indeed need to organize themselves into representations of that angular momentum! So our two- 
dimensional harmonic oscillator has all spin representations: / = 0, 5 1,---. And the only thing we have to check is 
that the Jjs commute with the Hamiltonian N; + Nr. This is the first example of a hidden symmetry in a physical 
problem that we've encountered, and it allows us to explain how the degeneracies in energy levels can fall into angular 


momentum representations. 


32 April 13, 2020 


We've been discussing tensor products recently, and there's a lot of properties that we'll want to go over and understand 
well. Because of the homework due tomorrow, we'll finish some of the discussion from last time first. 
As a reminder, we were discussing operators on tensor product spaces last time: for example, the operator H@/ € 


L(V @W) acts on a vector in our space via 
(H@!)\(v@w)=Hvew. 


Last time, we stated a general result: 


Proposition 255 
L(U @V) = L(U) ®@ L(V) |; The vector space of linear operators can be written as the tensor product of the 


individual linear operator spaces of V and W. 


This might be a bit disorienting, but it’s important if we (for example) care about finding the most general operator 
on a vector space. 


We'll explain this in the case of a finite-dimensional vector space. We can think of linear operators on a vector 
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space U of dimension N as matrices, spanned by the basis matrices: 
L£(U) = span {|e’) (e|}, L<ij<n. 

Here, |e’) (el | represents the matrix with a 1 in the (/,/) entry and a O everywhere else. Similarly, we can write 
L(V) = span {lef) (ef |}, 1S kK, £< M. 


And now if we try to write down the basis vectors for £(U) ® L(V), recall that we should tensor the basis vectors for 
L(U) and L(V) together: this gives us MN basis vectors in total. Thus, 


L(U) @ L(V) = span {|e”) (e| ® ley) (ef |} 1 <k k<M1<ij<N. 


So the most general linear operator in L(U) ® L(V) ts a linear combination of the basis vectors, meaning it is of the 


form 


S= D7 Giieler) (e] @ lex) Ce] 


TJiGe 


where the c;,j,4,¢ are numbers. 
That accounts for the right hand side in the proposition above, and now let’s try to look at the left side. An 


example we gave last time of an operator on U ® V Is the swap operator in the case where U = V. 
T(u@v)=v@u. 


A good way to understand linear operators is to let them act on basis vectors: thus, let’s apply T to |e) ® |e). 
Then 


T len) ® leg) = leq) ® les) 
basically just swaps the indices, which means that in general, we have 
T(u@v) =T (uP ep) @ v3 |e9)) 
(where we're summing over p, q) and then we can bring the constants u°, v? outside and swap the indices to get 
= uP VST (ep) ® |eg)) = uP v9 eq) |p) - 


Looking at the two inner terms, notice that v7? |e) represents a vector v in the vector space V. Similarly, the two 
outer terms u? |ey) represents the vector u in the vector space U, and now we indeed have T(u@v) =v@u, as 
desired! 

What this illustrates is that we indeed only need to define T on the basis vectors, which (we'll soon show) means 
we just need to define our linear operator on U and V separately. The punchline now is that because the operator 


A= |/) (i| gets us from |/) to |y), we can write 


T= doles) ® lex) (ep | ® (ey || 
p.q 


But comparing this to the boxed equation for S above might make this look more familiar: we can rearrange this 


as 


= doles) {e| @ lex) (e5| 
P.q 


(by the definition of composition of linear operators). But now we've understood what's going on: (1) the boxed 
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equation for T represents our linear operator as an element of £(U @ V), because we put a ket basis vector next to a 
bra basis vector and sum over all possibilities. But (2) the line immediately below that has rewritten the linear operator 
so that we're actually tensoring together a linear operator in £(U) and a linear operator in L(V) (and taking linear 


combinations)! In generality, any operator 
S= DF cise le) (| @ lek) (ey |. 
ijke 
which lives in £(U) ® L(V), can be rewritten as 
S= DF cine (le) ® lek)) (Ce7] @ (ef). 
ijk e 
and now the ® symbol is “tensoring our vectors, not our operators,” so this now lives in L(U @ V). 


Fact 256 


Remember that when we write L(U) @ L(V), we mean that we take linear combinations of (operators in U) 


tensored with (operators in V). We can’t always write any operator as S ® T for S € L(U) and T € L(V), just 


like in the case with entangled particles. 


An interesting question: what are the coefficients c; ; .,¢ for our swap operator T here? Remember that coefficients 
correspond to matrix entries, so we'll write our swap operator T in matrix form. Let’s do the case d = 2, so our basis 


vectors are |+) and |—). We know that one term of T looks like 


I) I+) CFL (1 ] = I=) C41 @ |) (HI 
[EY =) a) = a | 
Sel = lel lS | 


Adding these four things together gives us the whole operator 7 — remember that this is telling us the action on each 


of the 2 x 2 = 4 basis vectors, so it determines everything. And now we can write everything in matrix form using the 


1 0] fo o] fo 1], Jo 1 0 0] _ Jo o 
+ ® + @ 
0 ; E ; ; ; f 0 0 ] ; 1 


and we have a definition for the tensor product of matrices (which is consistent with the way we define our tensor 


right hand sides: 


1 0 
0 0 


0 0 
1 0 


i ® ® + 


product): the idea is that A @ B can be thought of as multiplying in copies of B with each entry of A. This gives us 
ee er | 


0 O = = Be pe wae ie to 
= tt a + 
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where the dots represent A having component 0, so the whole copy of B vanishes. So 


oO 0 fF 
or CO CO 
OO» 2 
rF Oo O Oo 


and thus this tells us that four of the sixteen coefficients cj,;,4,¢ are 1s, while the other twelve are Os. 

To finish, we can talk a bit about cloning: we've studied teleportation in lecture, in which Alice can teleport a 
quantum state to Bob by just sending two bits. In such a case, Alice does not have the state anymore — Bob's state 
is rearranged at the same time that Alice’s state is damaged beyond repair. So if we want to talk about a cloning 
machine, we would take a state |y) = a|+) + b|—), put it through a machine, and end up with two copies of that 
state. 

Well, the machine cannot create new particles out of thin air, so we must start with a second particle in some fixed 
blank state |a@) (just like with a photocopy machine). So this machine takes in two particles |w) and |a), and we need 


a linear operator U such that we end up with two identical states: 


U|v) @ |a) = |b) @ |p). 


Then the no cloning theorem tell us that there is no such unitary operator U that can do this for all states! We'll 


discuss this more later on. 


33 Angular Momentum, Part 2 


Last lecture, we discussed (with an algebraic analysis) states |j, m), which we'll now label with |£,m) because we're 
talking about orbital angular momentum. With orbital angular momentum, we can’t actually have half-integer values 
of j. In fact, systems like spin states don’t have wavefunctions in this sense; only states of integer angular momentum 
have wavefunctions, and those are the spherical harmonics we'll be discussing today. 


Remember that our indexing |2,m) has 2>0 and —2< m< &, both integers: then we know that 
L?\é,m) = f?e(2+1)|é,m), Lz\ém) = hm|£,m). 


To approach the problem of finding these wavefunctions, remember that we already did some work in constructing the 


L? operator. Specifically, we have 


0 1 @ 
D> 2 H | 
ee (sam ole a saraow) 


and (we didn’t do this explicitly, but it’s a similar derivation) 


ve LA 0 a\ ha 
z= 7 \*ey “ax ) 7 7g 


(We should think of this as rotating around the z-axis, so it changes phi but not theta.) We also defined the operators 


Ls last time, and we can also write those in angular form: 


La = het (Ss oe ) . 


sind 06 00 


This takes a bit of algebra, but we can find it in various books, and the whole point is that we have differential operators 
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that act on theta and phi and don’t care about the radius. So mathematical physicists invent spherical harmonics 
of the form Yem(0, o), defined such that 
1? Yom = HLL +1)Yem 


and 
L2Yem = AmMY¢m, 


where we think of L? and L, as the differential operators in 6, @. We can think of these functions as being wavefunctions 


for our states |£, 1m), and this is the natural way to think of them: 
Yom = (8, b|£, m) , 


which is the analogous idea of saying that w(x) = (x|q), only with angular coordinates. In order to extract some more 
properties so that the identification here is natural, we can start with the completeness relation. In three dimensions, 


we know that 
/ d?x |X) (X| = 1 
is a completeness relation for position states, and our goal is to do this for spherical coordinates: we find that 
f detraey(rsin Odd) |r@¢) (rOd| = 1. 
We want to ignore the part that happens with r, so we'll write this as 


[asin Od |6¢) (66) f dre) (r| = 1, 


where we're basically “splitting up” the states in the orthogonal angular directions and the radial direction. But the 
two integrals here don't talk to each other, so we can say that the first integral acts as a completeness relation for 


things that just depend on @ and @: in other words, we'll postulate that we have a completeness relation 


[asin 6d |6¢) (64| = 1. 


wT Qn -1 21 
| désin ead | df= -{ a(cose) [ dd, 
0 0 1 0 


we can rewrite the term bat d(cos @) 7 dg = f dQ as the integral over solid angle, and now we just know that 


Rewriting, since we know that 


jf sl09) (061 =1. 
So when we're trying to define spherical harmonics — the |£, m) states — we know they are orthogonal, meaning 
(e'm'|£m) = bee Omm- 


Remember that orthogonality is guaranteed here because we have Hermiticity and distinct eigenvalues. Specifically, 


we can always ask the overlap to be 
(L'm'|ém) = fas (2m! |0¢) (0¢|€m) = / AOQ2Y pm (8, 6) Yom (8, &) = See Omm'- 


And from here, we can construct the wavefunctions in various ways from the quantum mechanical intuition: we can 


start by building the state Yg, because L, kills this state, meaning the differential equation Is particularly simple. 
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From there, we can find Ygg-1 and so on, using the lowering operator repeatedly. But the formulas are messy and 
normalization is annoying, so we won't talk about that much — if we ever need a special harmonic, we can just look in 
a textbook. 


So now we can discuss the radial equation: suppose we have a Hamiltonian 


p 
H=—+4+V 
2m +V(9), 
which we can rewrite as 
rie 1 


2 
T t V : 
2m r Or 2mr2 i) 


We'll solve the Schrodinger equation for this Hamiltonian using separation of variables: we'll write our wavefunctions 
satisfying Hwy = Ew as 
Weem(X), 


in terms of its energy and two angular momentum parameters. This isn’t going to be exactly correct, but we'll do the 


following idea first: we want to rewrite this as a product 


fe em(1)Yem(9, p). 


This is our initial ansatz, and we can try plugging this into the Schrodinger equation: we can cancel a Y;,, term 
throughout, and we're left with 
h? 1 d? he 
5 ae rfeem) + 5 Tm pa ele + LU) feem + V(r) fem = Efeem- 
where the second term comes from the definition of Yg,,. But now this differential equation doesn't depend on m at 
all (the m in the denominator is a mass, not the label m for our states), so w is a function of r, indexed by E and 2, 
and now we can multiply through by r to find that 
he d? neée+1 
oC ae 
2m dr 2mr2 


(rfee) + V(r)(rfee) = E(rfee). 


This motivates the definition of 
Uee(r) = rfee(r), 


and now our differential equation is 


he d? UEe ne(e+ 1) 
V =E : 
Om dr (r) + 2mr?2 Vee Vee 
This is known as the radial equation, and the expression (vir + me) is often called the effective potential. 


The function f is now of the form , and we can find U by solving a one-dimensional Schrodinger equation 


Vee(r)Yem(9,0) 
r 
with effective potential depending on 2. So the central potential question is actually infinitely many Schrodinger 
equations! 
From here, the first thing we'll discuss is the question of normalization and boundary conditions. |f we want to 


normalize a wavefunction, we want 
[ ePxtdeem(2)P = 


and we can convert this into angular variables and plug in our separated functional form f to get 


aa [ Parlvee?y. (6,6)Yem(0,¢) =1 
re Lm\Y 2,m( O)= : 
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(The r? in the denominator comes from squaring wy) But now the angular integral is just 1 by orthonormality — 
it corresponds to the case where £ = @’ and m = m’. And the r?s cancel out, and we have a nice condition for 


normalization: 


| dr|Uge(r)|? = 1. 
0 


So in a way, U does really play the role of a wavefunction on a line — its squared integral should be 1. 


Proposition 257 


This leads us to a main point, which is something that should stick in our head: when we want to organize our 


spectrum for such a problem, we should draw a plot with 2 on the horizontal axis and E on the vertical. 


Most of the time, we'll have bound states, and that means we'll have states for values of 2, m and some energies 
E. For each of 2=0,1,2,3,--- (which we can draw along the horizontal axis as a histogram), we'll typically have E 
being quantized, and we won't have any degeneracies because we have a nondegenerate spectrum for bound states. 
This means that we can draw a discrete set of lines for each value of £, with each one corresponding to an eigenstate: 


we'll end up with a sequence of horizontal lines above each value of @. 


E (energy) 


For the first column 2 = 0, we can then label the energies starting from the ground state as Fi, F20,--:, and we 
find these energies by solving the Schrodinger equation with 2 = 0. And then we can do the same with the second 
column, £= 1: since the @ potentials are larger, the energies will be higher (or at least the ground state energy will be 
higher). We can then label them £; 1, £21,---, and then we can repeat with higher and higher levels of 2. And no two 
lines will coincide in each column, because no two bound states with the same value of & will have the same energy as 
well. 

But that doesn't mean that there’s only one state for each line that we draw! For example, remember that 2= 1 
comes with three different possible values of m, and the energy doesn’t depend on m. So the energy of the Fy, 


multiplet actually corresponds to three states. Similarly, F;,2 corresponds to five states, and so on. 


Fact 258 


From here, our next question will be studying the behavior of the wavefunction more carefully: we'll see what 


happens when r > 0. 


It seems like normalization is the main thing we care about — perhaps, as long as the function doesn't diverge near 0, 


anything will be okay. But it turns out this is false, and we actually need lim Uee(r) = 0] as well. To understand why 
re 


this is the case, let’s look at a simple case where something goes wrong: suppose that 
lim Uge(r) = c. 
r>0 
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Normalization isn’t a problem here, so something else must be the problem — let's look at the case where £ = O for 


simplicity, and now our wavefunction corresponding to Ugo looks like 


Weoo = coe 
(m = 0 if 2= 0), where we've used the fact that Yoo is just a constant. But now w looks like e as W approaches 0, and 
this is bad: the Schrodinger equation tells us that Hw = -Fyy +--+, and the Laplacian term is vs = —476(X). 
An there’s no reason to expect there to be a delta function in the potential, because that gives us infinitely many 
bound states. Thus we can't cancel that term in the Schrodinger equation, and thus we can't get wavefunctions to 
work out in this case. 


We can say something more about these potentials, too: 


Fact 259 


We're going to look at cases where centrifugal barrier, which is the uaa 


sme «term of the effective potential, must 


dominate when r goes to 0. 


So V(r) might look like 2, but It’s not s or something worse. Then we can look at the differential equation, and 


now V(r) and U are less important than S as r goes to 0. Thus, at leading order, we just keep the kinetic term: 


iP og ne(e+ 1) 
eect =-(), 
2m dr? 2mr2 
Simplifying constants, we end up with 
d*Uee - Ae U 
dre as 
It turns out that the solution here is of the ansatz Ugg = r*, and plugging this in yields either s = @+1o0rs=—2. 


But the latter case looks like Uy ~ 5, and this does not go to 0 as r— 0 as long as 2> 1. 


Proposition 260 


When the centrifugal barrier dominates, the wavefunction will look like 


And because our wavefunction is in terms of f = u this means that 


feew r’, 
which means that f behaves like a constant for 2 = 0. Physically, this means that when we have zero orbital angular 


momentum, there is some chance of having the particle near the origin. However, for any 2 > 0, f must vanish, and 


this explains the name centrifugal barrier — we can't get too close to r = 0. 


Fact 261 


Next, we can consider the case where r goes to infinity: again, we need to be careful what we're assuming, and 


the analysis here is richer than we can state quickly. 


We'll just consider some simple cases: in the case where V(r) = 0 for all r > fo, or when rV(r) > 0 as r goes 


to infinity, we can ignore the contribution for V at large r. 


Remark 262. These two cases do not account for the hydrogen atom, which has a potential of 2, so we'll need to 


figure out how to deal with that separately (and we will soon). 
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The point of this special case is that V(r) dependence is less important than the centrifugal term, and now we just 


have 
he d? 
2m dr2 


There are now two cases: when E < 0, we have a decaying exponential 


2m|E 
Ueg ~ exp (- nh ) 


(it turns out that the hydrogen atom will get a power of r multiplied in here somewhere), and when E > 0, we have 


Uee = EUeg. 


oscillating solutions 
2mE 
2 


With this, it's easy to make qualitative plots of how our solutions look: we know how it looks at the origin (proportional 


Ue =exp(tikr), k= 


to r®), and far away (they then decay exponentially), and this is the kind of study that we do in 8.04. 


Example 263 


We'll now start to solve the radial equation with specific potentials: we'll begin with the free particle. 


This is more nontrivial in spherical coordinates than it is in the Cartesian case! We know that particles in the usual 
case have a fixed energy and momentum, so we usually label them by three momenta or with an energy and direction. 
But we won't be using momentum eigenstates for our spherical coordinates, and this method will help us solve more 
complicated problems too. 

To be more precise, we can label the states of a free particle with three numbers: sometimes we use /1, P2, P3 (for 
the momentum) or E, 6, @ (for the energy). In our case, we'll be using (F, 2, 1m), and it turns out that we'll end up 
with the same number of states anyway. 

Our differential equation here looks like 

te d?Ueg 
2m. dr? 


he(e + 1) 
2mr2 


Uee = EUex. 


(Remember that the V term is just zero, but the effective potential is still nonzero.) Canceling the constants, this 


just becomes 


d?Uge e+ 1), 
dr2 r2 


where E is a positive energy, meaning k is defined as above. This equation is interesting — it looks like a typical 


2 
ee = k*Uee |, 


one-dimensional Schrodinger equation, so the energy seems like it should be quantized. But we also know that the 


energy shouldn't be quantized because we have a free particle, and the way to resolve this is that energy doesn’t 


actually appear in the differential equation. To explain this, we define a new variable | o = kr |, and this will clear 


out all of the energy terms: changing variables yields 


dUre , £(2+1) 
dp? : 2 


Uee = Vee, 


and our rescaling has removed the energy E from the equation! It still makes its way into our solution, because p = kr 
does still depend on energy, but we get no quantization in solving the differential equation itself. 

But then we can look more carefully at the differential equation, and it turns out this is pretty nasty. Without any 
of the terms here, the equation is easy, but whenever we have two derivatives of a function f, a af term, and an f 


term, we’re in the Bessel function world. 
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And spherical Bessel functions are not that bad, but they’re a little bit complicated. It's easiest to find solutions 


of the form 
Uee = r- Je(kr) 


where we don't care about the constant difference between op = kr and r because of normalization, and jg is the 


spherical Bessel function. This means that a complete solution looks like 


Weam = Je(kr)Yem(, $), 
where we've just divided U by r to get the familiar form of w. 


Remark 264. There is a J-type and an N-type Bessel function, but the latter is singular at the origin, so it doesn't 


matter here. 


From this, we can extract some well-known behavior: as o — 0, one property of the Bessel function is that 


e+1 
p- Je(p) ~ Ora 


This is indeed consistent with Ugg behaving as r&1 for small r. We also know that as r + oo, the Bessel function 


. ; on 
p-Je(p) ~ sin (c = =) ; 


So this behaves like a trigonometric function, because this is the superposition of a sine and cosine. And the a factor 


behaves as 


here is just a phase, which ts fixed by the fact that our function needs to vanish at the origin. 
This gives physicists a lot of opportunities — the free particle should behave like sin (kr — &) for large r, So we 
can consider a localized potential. The solution far away from that localized potential is a superposition of sines and 


cosines, so it’s a phase difference away from the Ue, we're talking about here. Therefore, we'll have 
; ig 
Ueg = sin (« =e + su(E)) 


and thus we can see the effect of our potential through the phase shift 6! In particular, if we do an experiment 
with particle scattering, we can use that shift to understand more about the potential that sends waves affecting our 


potentials. 


Example 265 


If we have an attractive potential, this “pulls the wave function in,” so it corresponds to a positive 6. On the other 


hand, if the potential is repulsive, we “push the wave function out” and get a negative 6. 


We'll finish by introducing another example, which is the square well. We studied the infinite square well in the 
one-dimensional case — it’s easy, and it’s just a combination of sines and cosines. Then the analogous idea in the 
three-dimensional case is to take a spherical cavity, in which the particle is free to move for all r < a but has infinite 
potential past that point. So 

0 r<a 
V(r) = 
co r>a, 


and we can solve this by just imposing boundary conditions: inside the cavity, solutions will look like 


Uee ~ r- je(kr), 
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and we just need to make sure we satisfy the boundary condition 
Je(ka) = 0. 


This seems like the most symmetric potential possible, but there isn't really much to say about this physical system: 


we won't get a lot of energy deneracies. On the other hand, if we look at something like 
V(r) = Br’, 


we'll end up with lots of degeneracies in energy: we'll understand in the coming lectures why that’s true! 


34 April 15, 2020 


There's been a lot of new ideas with the multiparticle states and tensor products, and we've most recently been talking 
about angular momentum. We'll still spend time today talking about the former topic, and then angular momentum 
is basically the main topic of the rest of the semester. 

There will be two more problem sets for this class: one due next Friday (Bell inequalities, EPR, some angular 
momentum) and one due two weeks after that (on addition of angular momentum). We'll have a second test in two 


weeks — it'll be similar to the first one. 


Fact 266 


There will be an anonymous survey about the changes made to this class, and we should fill that out if we have 


any comments. 


We started discussing the no cloning theorem last time. Recall that the idea is that we start with an arbitrary spin 
state: here, we'll call it a, |+) + a_ |—). We're trying to make a photocopy of this state, so we'll also put in a generic 
|+) spin state. Then a cloning machine would start with these two particles, and we’d end up with two particles that 
are both in the a, |+) + a_ |—) state. This is a deterministic machine, and the no cloning theorem states that under 
the assumption that our cloning machine is unitary time evolution, we will not be able to clone our particle in 
general, other than a few select states. (It’s true that we can also do measurements, and that would be an interesting 
research project to look at. But then we start introducing probabilities, and our output becomes nondeterministic. ) 

If V is the vector space of spin states for each particle, our initial and final states are both in V @ V, so our cloning 


machine must be a linear operator in L(V @ V). Specifically, if we call the blank state |b), we must have 


U : |p) @ |b) > e!® |b) @ |p) 


be the action of our machine U for all |w) € V. (Here, we might as well assume our states are well-normalized: 
(w\w) = (b|b) = 1.) The ¢ here is a phase — it can depend on |w) or |b), but it won't end up being very helpful here. 


Let’s state the result we're trying to prove: 


Theorem 267 (No cloning) 
There is no unitary operator U sending |W) ® |b) — |w) ® |) (for some @ a function of |) and |b)) for all 


Ip) eV. 


Proof. We'll stop writing the tensor product ® symbol from here. Suppose that there is a single state which can be 
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cloned: that means our operator U looks like 


U : | Wr) |b) > |r) |r) e’®. 


Take norms of the initial and final states: we start with 


(Wil1) (b|b) = 1, 


and we end up with 


e-'P (api |pi) (Wilyr) e® = 1. 


(We found both of these by writing the bra versions next to the ket versions and doing some rearrangement.) So 
there’s no crazy obstacle here — this preserves the norm in our tensor product space. 

Now, there's always a unitary operator which takes a vector |e) of length 1 to another vector |f,) of length 1. 
The idea here is that we can construct a unitary operator out of this: the operator |f,) (e:|, which will send |e,) to 


|f,), is not quite unitary yet, but we can use Gram-Schmidt to get orthonormal bases |e,) ,--+ , |@n) and |ft),--- , |fn). 


v= | (e\| 


is indeed unitary, because it’s a change of basis between two orthonormal bases! And we can also check that 


Now the operator 


UtU = S° Je) (e| = I. 


And now we can generalize this: suppose we have two orthonormal states |e) ,|e2), and we want to send them to 
two orthonormal states |f1) ,|f) respectively. The same Gram-Schmidt argument tells us, again, that extending the 
bases gives us a unitary operator that does the job. In our original problem, this means that we can indeed clone two 


orthonormal states in V: 


[W1) |b) > |r) |r) e’™, abe) |b) > |e) |e) e'® |. 


Further generalizing, this means that we can have n orthonormal basis vectors in a space of dimension n, and we 


can clone all of these n states (because they are an orthonormal set and are being mapped to an orthonormal set). 
So now we're getting to the punchline: suppose the two states that are boxed above are arbitrary, so they're not 
necessarily orthonormal. We know that the unitary operator should preserve inner products: since (Uv, Uw) = (v, w), 
we should have that the inner product of |w1) |W) e’% with |W) |W) e’% (the final states) is the same as the inner 
product of |qW1) |b) with |) |b) (the initial states). This means that 


()1|\Wo) (b|b) = e Pe (apy /Wo)> => | (Wir) (1 — e F-) (aby |po)) = 0], 


so this only works if W, and we have overlap zero (this is the orthonormal case we're already talking about) or if 
(Wile) = eM) => | (Wale) |? = 1. 
But the Schwarz inequality tells us that 


| (Wile) |? < | (Wilda) | (valde) | =1 


only has saturation when w and wp are different by a constant e’, which means that they are the same state! 
And now we have our result: any two states that can both be cloned must be orthogonal, so vector spaces can only 


allow us to clone up to n orthonormal basis vectors. (We can pick any n such states, but then the cloning machine 
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won't work for any others.) 


Corollary 268 


In a vector space of dimension n, there are only n states that can be cloned. 


We'll conclude with an application to quantum computation. Normally, we use two bits 0 and 1, but quantum bits 
(or qubits) are quantum states in a two-state system spanned by |0) and |1). 

Consider the classical CNOT quantum gate, which takes in two (classical) states (x, y) and outputs (x, y @ x), 
where ® denotes addition mod 2. (In other words, if x = 0, nothing happens, and if x = 1, we flip the state of y: the 
top bit x “controls the gate.”) 

The analogous quantum gate does something similar: for the basis states x, y € {0, 1}, we take in two states |x) 


and |y) and we output |x) and |y @ x). In other words, there is some unitary operator U such that 
U : |x) @ ly) > |x) @|y @x). 
So now imagine feeding in the state 
(ap |) + a1 |1)) @ |0) 


into our state. It looks like the CNOT gate might actually clone the first particle (if we naively write it out like in the 
classical case, saying that the second state is now |y @ x) = |x)), but we can't use that logic! We have to instead 
write out the initial state as 

aq |0) |0) + a1 |1) |), 


and now the gate replaces the first expression with a9 |0) |0) and the second with a; |1)|1). So now this gate doesn’t 
copy what’s in the top — it gives us an entangled state! Indeed, this circuit only clones the |0) and |1) vectors, but 


it cannot clone anything else. 


35 Angular Momentum, Part 3 


Today, we'll start by solving the square spherical wall problem. Recall that this means we want to solve the radial 


equation 
vec ua 8! 
a ae + Verr(r)Uee = EUee, 
where U = rf(r), for the potential 
ne(e+ 1) JO r<a 


Verr(r) = V(r) + V(r) = 


2 
2mr oo r>a. 


The first step is to look at the inside of the well — the particle is free in the region r < a, which is why we considered 


the free particle as our first example last time. in the range r < a, defining k = 4/ ome and o = kr, our differential 


equation simplifies to 
d?Ug¢ 
dp? 


(We've just changed the constants a bit, so that the rescaling gets rid of the explicit energy-dependence.) As we 


e+ 
Be ; tee 
p 


mentioned last time, this equation has Bessel function solutions — it's not a simple sinusoidal or power solution. We'll 


look at the special case 2 = 0, because this is the only case where we don't need Bessel functions: then our equation 
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UEo Ueo = Asino+ Bcosp. 


1 near the origin, which indeed happens here as long as the 


Remember from last time that Ugg must behave like ré+ 
cosine term disappears. And thus 


Ueo(r) ~ p =sin(kr). 


Because the potential goes to infinity at r = a, we need to satisfy the boundary condition Ugo(a) = 0. This means 


that kr = nt, so 


k= ky= = 


for some positive integer n. So far, everything here is analogous to the one-dimensional infinite square well, and the 


energies will look like (solving the equation k = ,/ ome. for E) 


A? k? hh? hh 
Eno = > = = (kna)? = —— (nt). 
aay, Dmat | na) oma 


(These are the energy levels for the 2 = 0 state.) Most of these constants are irrelevant, and the important thing to 
remember is that the constant fraction is the “typical energy” for a system of length scale a, and we're just scaling by 


(n)*. This motivates the rescaling: 


Definition 269 
Define the unitless quantity (for any n, 2) 


This tells us how much bigger an energy level is compared to the natural energy scale for our system. 
From here, let’s look at the general case: we'll now need to know the zeros of the spherical Bessel function. For 
example, /:(9) has zeros when tanp = p, which requires a numerical calculation to solve, and in general the Bessel 


function zeros can be found online if we need them. We'll use the notation 
Zne = nth zero of Je, 


where all z's are nonzero and n is indexed by positive integers. And now, the energy eigenstates corresponding to an 


angular momentum @ yield a boundary condition of 
Uee(a) =0 >> knea = Zn 24. 


So our energies look like 


— _ Pkob _ A knea)? 
nt Oma2——-2ma2 
so 


e¢= (ead =2p 


which are just the squares of the zeros of the Bessel function! Here's a small table of €,,2 values for small n and 2: 


f=0.| 021 |¢=2|o=3 
n=0| 9.87 | 20.2 | 33.2 | 48.83 
n=1| 39.48] 59.7 | 82.7 | 108.5 
n=2 | 88.82 | 119 
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The purpose of looking at all of these numbers is to compare the energies in a plot: remember that we do this by 
plotting one column for each 2, and drawing horizontal marks at various energy levels for each one. Indeed, the ground 
state energy levels (corresponding to n = 2) do get larger for larger 2, as we predicted, but there are never matching 


energies for two different pairs (n, 2), which may be surprising for a round, seemingly-symmetric potential! 


Example 270 


The next system we'll solve is the three-dimensional (isotropic) harmonic oscillator. 


This system has the potential 


1 1 
V= aims? (x? +y°4+2°)= mur. 
It turns out that there is much more symmetry in this system than there was in the spherical well! To start building 


our spectrum, note that the Hamiltonian looks like 
as n x 3 
H = fiw Ni + No + N3 +5 , 


where Ws are the number operators in the three directions. Note that the state space of our system can be derived 
from Hj, the state space of the one-dimensional harmonic oscillator. Conceptually, a 3-D SHO comes from the 
creation and annihilation operators for x, y, and z, so building a state of a three-dimensional oscillator depends on 


finding the number of als, als, and ats: thus, we actually have a tensor product 


H3p sHo = H1 © H1 @ H1, 


where any basis vector comes from picking some number of ats, als, and ats. So even though we introduced tensor 


products as corresponding to multiparticle systems, we're tensoring different attributes for the same particle here 
— this is just the correct way to combine data in quantum mechanics. 


But now we can understand the energies E,¢ by plotting them in an energy diagram. 


+ Our ground state |0) has number eigenvalues Ny = Nz = N3 = 0, and the energy is F = 3 fw: this is the single 
ground state with the lowest possible energy, and because it is spherically symmetric, it must come from the 
angular momentum equation. We want to know what the value of @ is, but there’s a single state here — there's 


no multiplicity, while (for example) 2 = 1 corresponds to three linearly independent states for m= —1,0,1. So 


our ground state energy must have angular momentum | é = 0 |. 


- For the next energy level, there are three different states: al. |0) , al, |0) , a! |0). Each of these states has energy 


fiw (1 + 3) = 3 hw, and the multiplicity of 3 means we can argue that this corresponds to |@=1)|. After all, 


£=0,1,2,3,--- have a 1,3,5,7,----fold degeneracy in m, so the only way to get three states is the second of 


these options. 


The level after that, with E = hw, has six states: 
(al)? 0), (ah)? |0), (ab)? Jo), akal jo), alal|o), al al |). 


Each of these has N = N; + No + N3 = 2, and the six states must organize themselves into various different 
values of £. We can't use £ = 3 (that yields seven states), so we must either have two different sets of 2 = 1 
states (3+3) or a set of 2 = 2 and a set of = 0 states (5+1). But we can't build with two @ = 1 states, because 


that means we'd need to put two horizontal lines at the same spot on our energy diagram for the same value of 


£, and this is not allowed! So we instead split our states of N = 2 into five of |= 2], and one state of | £2 = 0|. 
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(We can write this as a direct sum of the two spaces.) So this is already interesting: we get an identical energy 


in different columns of @. (We'll understand where this matching comes from later on in the course.) 


Finally, let's consider N = 3. This case has ten possible states: we can cube one of the raising operators (3 
options), use one of each raising operator (1 option), or use two of one and one of another (6 options) to get a 
state of energy $fhw. This can either originate out of (€ = 4) @(é = 0) (with 9+1 states) or | (2 = 3) @ (= 1) 
(with 7+3 states). It turns out to be the latter (the best way to understand this is to look at the lowest energies 
for each value of 2, which go up in step). 


* If we want to count the number of states for N = 4 or higher N, note that we just need to find nonnegative 
integers ny, Ny, Nz with n,+ny+nz = N. (This corresponds to the state (al) (ab) ™ (al) |0).) Doing casework 
on the value of n,, this yields a total of 


(N+ 1)(N + 2) 


1+2+---+(N+1)= = 


states of a given sum of number operators N. And we can carry out the same argument to understand that 
N = 4 most likely corresponds to = 4, 2,0, N = 5 corresponds to £= 5,3,1, and so on. 


So the above analysis told us the @values at a given energy, and we can also use this to see the energy levels at a 


given £. Basically, we do two jumps of fw between energy levels at a given 2: 


To understand this system better, remember that we discussed that we can replace the operators ax, ay with ap, ay, 
which allows us to write 


Lz =fhi(Nr — Ni.) 


(we derived this for a two-dimensional oscillator, but it’s still true in three dimensions). It’s a bit harder to find the 
angular momentum operators Ly, Ly, but we can do that, and this time this is an actual momentum, not an abstract 


one like in the 2D case. 


From here, we'll build states in the same way as before, doing casework on the value of N = N; + Ne + Ns. 


« For N = 1, there are three states: aie |0) ab |0) ar |0), which correspond to angular momenta Lz of f,0, —f 
respectively. So that gives us all of the structure for the 2= 1 multiplet: we get the three states with m values 
of +1,0,-—1. 


« Looking at the extreme cases, for N = 2, we have aval, |0) with an angular momentum L, = 2h, and this is the 
highest possible value of Lz — in general, Lz = NA is the maximum possible value for a state with total number 
N, because each al adds 1f to L;, each al removes 1fi, and each al does nothing. So there are going to be 
2N + 1 different values of L, for each N (from —Nf to Nf). 
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* So the only state with maximum angular momentum is eae |0), the only state with one unit less of angular 
momentum is (a!,)-1al |0), and then the next unit of angular momentum has two states: (a)~?(al)? |0), as 
well as (a yet |0). But the point is that because we have some state with maximum angular momentum, 
we'll get a multiplet corresponding to = N: this uses up the state of L, = nh, Lz = (N—1)h, and one of the 
states of L, = (N —2)h. And now the highest angular momentum left is L, = (N — 2)f, so that corresponds 
to £= N —2: this explains the jump by two energy levels! 


Indeed, we'll find two states that we can write at Lz = (N — 3)h, three states at Lz = (N — 4)h (which explains 
why we have an £= N—4, and so on. And we can conclude study of this system by understanding how we could have 
come up with this from the beginning without building it up: the answer is that certain operators commute with the 
Hamiltonian (meaning they don't change energy) and indeed move us from one value of £ to another (in other words, 
moves us to the right or the left by two values of @ in our above diagram). As a hint, the operators of the form aay 
does not change the energy, because it destroys one level in the y-direction and adds one in the x-direction. There's 


lots of hidden symmetries in the operators of this form! 


Example 271 
Our next system is that of the hydrogen atom: we have 


2 


pp ee 


m ie 


H= 


Here, m is the reduced mass of the proton-electron system: it’s roughly equal to me (the mass of the electron). 
There is a natural length scale in this system, known as the Bohr radius. We find this by setting p = 2 for some 


length ap, and then we set the two terms of the Hamiltonian equal (ignoring constants because we care about units): 


h? e nh? 
— ia ag = 


—>3 —, | 0.529 x 1077 m. 
ma ao me 


The > is important: this means that if we make the interaction between the electron and proton small, then the 
hydrogen atom’s radius gets large. The corresponding energy scale is = and half of that quantity, £., is a famous 
number — 13.6 eV. 

Our main question here will be finding the (energy) spectrum: there’s an elegant method for finding the ground 


state. We can write (for some specific constants y, 8) the Hamiltonian as 


3 7 : 
H=y+ aa (6 io*) (6, io*) 


k=1 


this is basically a factorized version of our above expression. (Remember that we need to be careful, because the 
operators &, and 6, don’t commute.) But now we can view the second term (Bx — ip*) as an operator and the first 
term (Bx + i6*) as its dagger: in a way analogous to the harmonic oscillator, the ground state should be killed by 


our operator, meaning 
ie ae 
( a io) | Pgs) = 0, 
and the energy of this ground state is just the constant . (This looks like three equations, but it’s just a single 
equation if we have a spherically symmetric state. ) 


Looking at the whole spectrum, there are interesting degeneracies just like in the 3-dimensional harmonic oscillator: 


we'll claim this result for now, and we'll show where this structure comes from later in the class. 
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£=0 £=1 L=2 L=3 
If we label the states for each 2 with y = 0,v = 1,--- from bottom to top, notice that the states with the same 
n=v-+4|have the same energy. That energy turns out to be 
e 1 
n=v+é, 


eyes 
ae 2a) =n 


where 0 < £< n—1, and in order to understand more of the structure here, we're going to need to introduce the idea 


of the Runge-Lenz vector. 


Example 272 
The Runge-Lenz vector comes from classical mechanics: consider a Hamiltonian for an elliptical orbit 
2 


p 
H=-—4V 
ok (r), 
where the (classical) force is F = <Vi(r)E. 
In this classical situation, we know that 4 - 
= p , r 
F = —_— = —V — 
dt (r) r’ 
and because we have a central potential V(r), 
dL 
ae = 0 


(there is no torque on the particle). It turns out that there is a (surprising) quantity here that is conserved: we start 


Cit en oe op 
qh * Ly = mV(r)r ai (2). 


So this gives us a conservation law when V'(r)r? is a constant, which we'll call e?. And this occurs exactly when 


with the quantity px L (this is a bit unmotivated, but it yields an interesting result), and then we can do some algebra 


to find 


2 e2 
V = 
(N=-<, 


e 
Vin=s 


which is the potential of the hydrogen atom: it’s a 4 force field. In such a situation, there is a conservation law 


d 7 
— {psx 
a (ex r 


For the sake of convenience, defining 
R — 


= 0, and we have a conserved quantity that is unitless. This R turns out to be conserved in the 


dR 
we know that aE 
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quantum mechanical case too: RF is an operator that commutes with the Hamiltonian H. (We'd have to hermiticize 
px L for things to work out.) But what’s important here is that this conservation law helps us understand the 


degeneracies in the hydrogen atom! 


pxL 
me? 


If we first consider a circular orbit, p is tangential and L is out of the plane of the orbit, so points radially out 
of the circle. Combining this with the radial vector means that R is some vector that points radially outward, and it 
must be conserved: thus, R must actually be the zero vector in the circular case. 


But R isn't zero in an elliptical orbit: if we repeat the same argument, It turns out that the vector will always point 


1 


along the major axis of the ellipse! (And this always happens in a + 


potential, though it’s good to note that Einstein's 
theory of gravity has a different potential, so the ellipse precesses.) And the magnitude of this unitless R turns out to 


be exactly the eccentricity of our ellipse. 


36 April 22, 2020 


Our second midterm will be in a week — some materials, including past tests and a formula sheet, have been posted 
for us to work on. We'll discuss test review next recitation, and it’s recommended that we look at the formula sheet 
to study — there's a lot more formulas than last time. (Basically, we should be able to realize what each one means 
and what it can be used for.) Logistics will be pretty similar to the first test, and we'll experiment a bit more with 


partial credit. 


Fact 273 


Everything we've learned up to today’s lecture and Friday's problem set is fair game for the test. Monday’s 


lecture will begin discussing addition of angular momentum, but it won't be on the test. 


Each midterm, as well as the final, are now worth 15 percent of our grade. (And the final will not be uniformly 
covered, because we just don’t have enough time.) We should expect a set of true/false questions on each test as 
well. 


With that, we'll move to class material: first of all, to answer a question posed, we'll consider the identity 
L2=pp? —(r- py + ihirp. 


The Hamiltonian of the hydrogen atom looks like 
2 2 


Fi +V(r), Vin) ==, 


H= 
2m 


so we need to write p* in terms of L? in the identity above to get the Laplacian term (involving p*). We want to 
divide by r2, and the idea is that the operator 4 acts on the wavefunction w by just multiplying by the operator 4 on 


the left, since r, p do not commute. This yields 
re 2: 2 
5, (12 + (rp)? — itr p) = 2°, 
and we can then substitute that into the Hamiltonian, which yields 


pt 
2m 2mr2 


1 


(0?) + 5a (rp)? — ihr p). 


Remark 274. When we see a fraction of operators 3, this is usually ill-defined: it could either mean B-1A or AB“, 


so we shouldn't write fractions unless we have something like oi because the operators L? and os commute. Similarly, 
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we should be careful with things like 


1 
(AB) 


1 1 
=(AB)1=BAt= =.=. 


We'll discuss some aspects of EPR now, using work by Greenberg, Horne, and Zeilinger (GHZ). 


Example 275 


Suppose we have three particles A, B, C in an entangled state (emerging from some kind of a elementary particle 


decay), and they go to Alice, Bob, and Charlie. 


EPR's logic would tell us that these particles have attributes — we can measure the x-component or y-component 
of some particle, and we always get the same result for a given particle because these observables are properties. Even 
if we don't know why the particles have these values for the observables, there are some hidden variables or attributes 
that determine all physical properties of the state. Call these hidden variables 2. 

Suppose, for example, that Alice measures the spin of her particle along the X state: then the result she will get 
looks like 

A(X; A) € {+1}, 


where we're measuring a, instead of S,. Here, we're saying that given A, the value of A is determined: there's no 


probability going on. Similarly, Alice can measure A(y; A) € {+1}, and Bob and Charlie can also measure along the 
X- and y-directions to get answers of 1 or —1, depending on X. 
The main result we care about is that we can produce an entangled state such that we have the following xyVV 
correlations: 
A(X, A)B(Y, A)C(Y, A) = 1. 


ALY, A)B(X, A)C(Y, A) = 1. 
ALY, A)B(Y, A)C(X, A) = 1. 


In other words, when one of our particles is measured in the x-direction and the other two particles are measured in 
the y-direction, their product is always 1 (either all +1s, or one +1 and two —1s). But now we can multiply these 
equations together to find 

A(X, A)B(X, A)C(X, A) = 1, 


since the square of any measurement we make here is always 1, so the y-terms all go away. But let's see what quantum 


mechanics predicts about this: consider the state 


1 
d= +) |+) |A : 
7a | ey wy cee ee alae 
We can see the analog of the XyY correlations now: if we consider the operator 


A B C 
Oy By @ay, 


this operator acts on ® (because it acts on the three particles), and it is Hermitian (because each of the three operators 


is Hermitian), meaning it is something that can be measured. But remember that 


+) |=) ; 


Ox|£) =|), oy |) 


so the operator will turn |+) |+) |+) into |—) |—) |—), except with two factors of / from the two oys, meaning it will 


turn into — |—) |—) |—). Similarly, the — |—) |—) |—) will become + |+) |+) |+), so ® is actually an eigenstate of our 
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operator: 
(0% @ of @ af) |v) =|9) 


with eigenvalue 1. The same logic works for the other two operators, and thus we've verified the XYy correlation 
property for this GHZ state. 


But now we can try seeing what happens when we measure all three particles along the x-direction: 


(0% @ of @ of) |0) = le ae ealoe dae ce alee 


1 
(| 
V2 
so we actually get an eigenvalue of —1! So the answer is exactly opposite from what we see in the classical case, and 
thus we've already found a way to violate the classical assumptions. And we don't even need to repeat this argument: 


any one single measurement gives the wrong answer. And one thing to learn here is that 
(0x @ Fy @ dy) ® (Gy @ Ox @ Dy) @ (Dy @ Gy @ Ox) = (GxGyoy) @ (TyOxIy) ® (Gyoyox), 


and the reason this doesn’t reduce to oy @ 0, ® oy is that the matrices don’t commute in the second term. In fact, 
we just end up with 
Ox ® (—ox) BOx = —(ox Ox ® Ox), 


as we've already demonstrated. 


37 Addition of Angular Momentum, Part 1 


We'll start this new topic by introducing some elements of perturbation theory, which is discussed much more in 
8.06. The idea is that many of the results of perturbation theory will be important for understanding various examples 
that come up in this last part of the class. (We won't do any derivations here, just a general primer of results. ) 
Suppose that we have a Hamiltonian 
H = H + 6H, 


where H) is known and 6H is some small perturbation. Suppose that our eigenstates of the known Hamiltonian are 
indexed by k, such that 
10) JA) = £0 «°° 


(The (0)s reflect the fact that we're working with the original Hamiltonian.) This means we know the (degenerate or 
nondegenerate) spectrum of H ©) and we want to understand what the perturbation does to this spectrum. 
Each of the nondegenerate states will be perturbed a little, and the degenerate states will typically split apart from 
each other as well: 6H may move the energies up more than others. We'll look at each of these cases now. 
In the nondegenerate case, there is a single eigenstate indexed by k, and the state | 0), as well as the energy Ex, 
will change by a bit: 
Ey = E\° + 5E, + O((6H)?). 


Here, we're making a first-order approximation of the Hamiltonian correction 6H. It turns out the formula is of the 


form 
5E, = (K°l5H|Ke). 


what’s striking here is that the correction to the energy doesn't require the exact form of the eigenvectors: we just 
need to look at the expectation value on the original eigenstate. 


On the other hand, when we have degenerate states (meaning they have the same value of Ex), we can understand 
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what happens to the splitting at first order. Suppose that our energy level is EO and there are N total degenerate 
energy eigenstates at that level: we'll label them as |n©, 2, where we have the additional indexing by the integer 
1<2<N. Assume that we've also chosen these states to be orthonormal, so that (n , €|n, k) = Ook. 


To lowest (zeroth) order, all of these states have the same energy, which is the eigenvalue 
H) | ne) = £00) | 70), e) 


To understand what happens to the splitting, note that our N states span a vector space Vy: what we need to do is 


compute the matrix for 6H in our space Vy. This means we need the matrix elements 
(OH) ke = (nf), k|6H]n, 2) : 


this gives us an N x N matrix. 

And then we get our answer by diagonalizing that NV x NV matrix. We need to find the eigenvalues and eigenvectors 
of this (Hermitian by definition) matrix, and if the energies split, some of the eigenvalues will be different. Labeling 
those N eigenvectors jw) (where 1 < / < N) and the corresponding N eigenvalues 6E,;, we can think of each 


eigenvector as a column vector, which corresponds to a linear combination of our original basis states [n®, k): 
0 0 0 
lw » = 0 In! ), k) aly. 
k 


These vi”) s are then the (approximate) new energy eigenstates for our perturbed Hamiltonian, and the energies 
are Ep) = E(°) +6E£,; +O(6H7). So it takes a bit more time to state what happens, but we just diagonalize the matrix 
that explains the matrix elements of 6H, which allows us to separate the corrections to the energy EO). 


Now, we're ready to move on to addition of angular momentum, and we'll start by stating the fundamental result: 


Theorem 276 


Suppose there are a set of operators J on the state space V, satisfying the algebra of angular momentum (that 


is, [yO ie = ines S), as well as another set of operators JP) on the state space V2. Then there is a new 


angular momentum 
Jj =I @14+1@ I? 


which satisfies the algebra of angular momentum in the space Vy ® Vo. 


Note that we needed to state this in terms of the tensor product space, because J) cannot act on Vs (and vice 
versa). But soon we'll just call this operator JO + J?) to make the notation easier. Note that if we had tried to add 


any other linear combination of these two angular momenta, it wouldn't work! 


Proof. We need to verify the angular momentum relation: 
Li §1 = [LP @1+1@ I? JM @1+1e J), 
But now if we look at the cross terms, the commutator will be zero, because (for example) 
Wane (es eihas ad. 
So operators originally living in different vector spaces will commute, and what we’re left with is 


[YP @ 1g @ 1] + [fe s?, 10 
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and then the /'s don't really do anything: we just get 
P,P] @ 1410 [1] 


by the formulas for angular momentum in the individual vector spaces, and this is exactly what we want: it evaluates 
to ihein ($Y @1+1@ JP) = ihein J 


And with a bit more practice, we won't need to use the tensor products when we're working with these objects. 


Example 277 


Our first example will be spin-orbit coupling: we'll have a hydrogen atom, with an additional term AH = —- B. 


We didn’t have a B in the original hydrogen atom, but we'll say that @ is the magnetic dipole moment for the 
electron: 


es 
fa 5 
H m 


where S is the spin. We're using Gaussian units, we can instead state this as 


because this allows us to estimate terms more easily. And the magnetic field B comes from the electron’s interaction 
with the proton: since the proton is going around the electron (in the electron’s frame), we have an “current that 
generates a magnetic field,” and this current is then going to be proportional to the angular momentum L. 
Specifically, let’s fix coordinates so that the electron is at the origin at some point in time, moving into the plane 
with some velocity 7, and the proton is to the left of the electron so that there is an electric field E pointing to the 
right (remember that electric field is not the same as electric force). Relativistically, the electric and magnetic fields 
that we see in different reference frames are actually different: thus, the magnetic field from the point of view of the 
electron will be o 
Ba Vx E. 
Cc 


and this (by the right-hand rule) yields a magnetic field pointing upward: this is consistent with the picture of having 
a proton going around in circles and creating a current. We'll remove the negative sign by using E x 7 instead, and 
we can calculate the electric field by looking at the scalar potential 

e? e? 4 Wiryr 


VO) ae a VS ae P , 


~s 


where the last step comes from replacing electric force with electric field and also noting that the electric field points 


radially outward. And now plugging things in, 


= 11 
B= 2 E\7l > = 
ae (r)\(Fx V), 


and borrow a factor of m to write this in terms of the angular momentum: 


Thus, we can finally calculate the perturbation: 


AH = —ji- bE = (§.7) 2% 
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Unfortunately, it turns out there's a relativistic error here: Thomas precession tells us that we must replace g with 
g — 1 (because the interactions of magnetic fields and dipoles change the precession rate when we don’t have an 


inertial system), which means that we basically lose a factor of 2: plugging everything in, we end up with 


e2 


Ais 2m? c2r3 


(S-L)|. 


To estimate this, recall that we have the Bohr radius ap = we; and we also have the fine structure constant 
e? 1 


Ace ter. 


Since S has multiples of f and so does L, we can estimate the dot product to be on the order of fi, and we end up 


with i 
2 
AH~w me (hiv) 


2 2 
le =6l ey, 


re Tre a 


The ground state energy of the hydrogen atom is Egs = =, and now 


AH ne? ne et 3 
——— = = = 0: 

2 -2 42 ht 2-2 

Egs m?c*aG mc? htc 


This means the ratio of the spin orbit coupling energy with the ground state energy is ws which is pretty small. This 
is called the fine structure of the hydrogen atom: the splitting of energies is therefore going to be pretty small. 
Remark 278. Note that we've been using Gaussian units this whole time: in SI units, we instead have 


e? 1 


AH = ——— ~~ 
8mEq M2c2r3 


(S>o), 
where we've already replaced g with g — 1 in this expression. 


Now that we have the perturbation AH, we can work with this a bit more to understand what happens to the 


hydrogen spectrum. We'll start with the simplest state of angular momentum, the state of | 2= 1, n = 2] (recall that 


n=2£+, so this is the lowest energy state for 2 = 1). Remember that this is actually a multiplet of states: 2 = 1 


has states of m = —1, 0, and 1, and our states can also be up or down, so we actually have 3 x 2 = 6 total states 
of the form 

\2,m) @ |S, Ms) , 
where we're fixing 2 = 1 and s = $, but we can have m= —1,0,1, and we can have m, = +3. Remember that |2, m) 


denote the angular part of our wavefunction, though there is also a radial dependence w1(r) which luckily only depends 
on £ (so it basically factors out of any consideration we're doing here). All in all, this means we have six degenerate 
states, and we'll need to use the perturbation Feynman—Hellmann result carefully: we need to select the correct basis 
of eigenstates from our 6 x 6 matrix 6H. Theoretically, we could find (/|AH|j) for all matrix elements 1 < i,j < 6 and 
find the eigenvectors as 6H goes to 0: basically, we want to find a basis |1’) , |2’),--- , |6’) of this energy level such 
that the matrix AH’ becomes diagonal, so that the changes of energies are just those diagonal entries. 

What we'll find is that we can relate the basis of states |= 1,m) @ Is = 5, ms) to a basis which tracks total 
angular momentum. To understand this, let’s look some more at the Les operator: it acts on our tensor product, 
and it’s actually defined as 

11 @Si +128 52 4+13@53=) 18S), 


f] 
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and we also have the sum of the angular momenta 
Jp =1;@14+1@ S;. 
The idea is to square J: 
cD ee ar 
i 
(this is not a tensor product — we're applying J; twice), which can be written as 


/ i] 


and now we can rewrite this as 


P=1?@14+2S°1, 85,4188? 
i 


(and when we have more practice, this can basically just be written as J? = L? + S? + ies though that can 
be confusing when we think about how it acts on states). Therefore, we have the formula for that term of our 
Hamiltonian: 

[.S=5(P-1?-8%), 

And now L? is a diagonal matrix in our original basis, because the eigenstates we're choosing all have 2 = 1, and 
similarly S* is a diagonal matrix — both are actually constants times the identity. That means that [+S is basically 
just a linear transformation of J?, and thus the total angular momentum is indeed the important quantity here. 

We'll call the original basis states |1, m) @ [5. m) uncoupled states (there's no entanglement going on), and all 
of these uncoupled states are eigenvectors of L?, S*. But because all of the eigenvalues are actually equal — each one 
has an eigenvalue of f£(€+ 1) = 2h? for L? and 3- ()? = 3M for 5? — if we take any linear combination of our 
uncoupled states, we will still have eigenvectors of both L? and S? (with the same eigenvalues). Thus, we just need 
to select specific linear combinations that are also eigenstates of J*, and then we'll be happy. 

Remember that L? and S? commute with all L; and S;, respectively, and Ls and Ss always commute because 
they act on different state spaces. So L? is known as a Casimir operator — it commutes with everything that is 
rotationally invariant constructed with Ls and Ss. So when we consider our matrix elements (/|AH|/), remember that 
we're considering overlaps of the states |/), which have a radial component as well as an angular and spin wavefunction, 
which we'll call |/). Since ; 

AH = BU - S) 


for some constant 6, the inner product calculation (/|AH|/) looks like 
2 * 1 |r el: 
BY] redrvi(r)di(r) - a (ic . sli) ‘ 
So we just select these /, j to be eigenstates of L- S, which will give us 


(3a VE 


and the inner product will be diagonal and indicate the relevant energy corections AE. 
We'll now turn our attention back to our complete set of commuting observables: normally for an unperturbed 
hydrogen atom, we have Ho, L?, and L, (where L tells us about the orbital angular momentum), and we can’t add 


any more operators because Ly and Ly do not commute with Lz. But now adding in the spin term, we can now have 
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the complete set 
1G SFT og Renesas ec 


because the original Hamiltonian doesn't actually know about the spin at all. On the other hand, if we have Ess. 
coupling, we now have a full Hamiltonian H’ instead of Hp. We want to check whether L? and L, can be put in our 
complete set again — we know that L? commutes with Ho, and we just need to check whether it commutes with L-S, 
which is does. But the L;s do not actually commute with L - S, so this time we can’t keep Lz anymore. Similarly, S? 


is okay, but S, is not, and so our new set currently sits at 
til he Gols 
But we can actually add something new this time: the total angular momentum in the form of J*. To check this, note 
the following: 
- J? is built with Ljs and Sjs, which commute with Ho, so J? and Ho commute. 


» J? commutes with L - S, because we can remember that 
a: 1 
[?,L-S] = |2, alt = [2 = 5?) 
and each term commutes. 


* Similarly, J? commutes with L?, S?. 


So we can add J? to this, and the natural question is whether we can also add J; (analogous to adding Lz or Sz). 
We go through the argument again: J; = L; + S; commutes with Hp because both L; and S; do, J; commutes with 
the remaining operators because it commutes with J?,L?,S%. We can’t add multiple Jjs, so we get our final set of 


Hermitian commuting observables which we can simultaneously diagonalize: 


H’,L?,S$7, J? |. 


But in our analysis, we won't be finding exact energy eigenstates: we'll use the Feynman-Hellmann method of diago- 
nalizing J? instead. Remember that L and S have eigenvalues proportional to h, so we often divide through to get rid 


of that. We have the very useful formula, which is the action of our lowering and raising operators Jy: 


Ja jm) = MVJGF 1) — m(mE 1) lj, m+1). 


We can write our uncoupled states in the following groups: 


2 =F: oe|5.5). apes -5) 
#=-F:[,0)@|5,-3), 1) @|5 5) 
2 }navell-) 


This is because the value of + is basically just m+ 5,, where m is one of 1,0,—1 and 5, is either $ or —3. So 
J, can be diagonalized without forming linear combinations, but we haven't diagonalized J* yet: we want to find the 
eigenstates |/, m) which have eigenvalue from J, = fhm and eigenvalue from J? = f(i(j + 1). 


And now we think about this by thinking about J, as an abstract angular momentum. We know that J, is at most 
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3 here, and —/ < m< J, so we must have an eigenstate of J? which has j = 3 we couldn't have something like = 2, 
because that would require us to have a state with J, = 2h in the multiplet, which we definitely cannot have from our 


above listing. So there are four states here: 


2 
oe 


ee * 2k 
— i m= — = 
J — 7D! 


2 


We must have m = 3 only coming from the top state by our above argument, so 


a) 3 11 
3 


Similarly, for m= —5, we can only use the bottom state: 


. 3 3 1 1 


But for each of m= +3, we'll need to take some linear combination of the states in each of the rows for % — it’s not 


so easy to describe them directly right now. And this accounts for four states, and the remaining two states must live 


in aj = 4 multiplet (it can’t be / = 0 because all of our m-eigenvalues are +3). Therefore, we can write what we've 


| ¢=1)@(s=5)=(1=5) o(#=5): 


found as 
both vector spaces have dimension 3x 2 = 4+2 = 6. Before we find the energy splittings, let's try to finish constructing 


the other four coupled states in our j = $ and j = 3 multiplets. We'll act with the operator J. = L_ + S_ on the 


boxed expression for |j = 3, n= 3): for the left side, we have an equation for how J_ acts on this state, which is 


35 3 1/31 3.4 
a a 55 laa) —|™3l5-5)} 


To deal with the right hand side, we'll act with the operator JL = L_ @/+/@S_. This yields 


and now we use the formula for lowering operators again: 


ite (30 -Garii ot 
=nT7=TxoH1.0)0 (3,3) 41.2) @n/ | ) 


i a a, 


which evaluates to 


i i 4 
nv2|1,0)@]5.5)+nlt.1)@[5,-5 ) 


So now we can set the two sides equal, and we find that 


. 3 1 2 1 1 1 1 1 


We can do a similar thing to find li = 3, n= —3) by raising li a 3, n= —3) with the J, operator: the calculations 
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are very similar, and we end up with 


ee 1 [zn.oe|s Te) elt 
aos 9 a BF FY ae 2 ox 


Indeed, these ended up being linear combinations of the states where 4 are +3, respectively, and the states have 


ended up normalized as well. 
This means we've now constructed the entire / = 3 multiplet: this isn’t necessary if we just care about the energy 


splittings, but it’s nice to have a concrete expression. 


Now we just need to build the j = 3 multiplet, and we can do that in a few different ways. We want |3, 3) and 
$,—3), which are going to use the same uncoupled states as |3, 4) and 3, —%), but the key idea is that we're 


forming an orthonormal basis (because eigenstates of Hermitian operators with different eigenvalues are orthogonal). 


So up to a sign, there's only one possible solution: it’s going to be 


, 1 1 1 11 2 1 1 
j==,m= = 1,0) @J=,= )+4/=hl1,1) @/=,-=)), 


a 1 ee 1 [zal velit? 
wane 2 a Pia ay pe al 


ail =D a 
(The central idea here is that the vector ; is orthogonal to | , So that’s all we need to do to our coefficients. ) 
a 


And we can indeed check that these are states with j = 4 — we've chosen our signs so that having J_ act on 
li =5.m=5) will yield |j = $,m=-—3). 


And now we can wrap everything up: remember that our energy perturbation looks like 


oe Oe 
~~ 8m€q m2c2 r3 2 


AH (Paras), 


but because we're looking at the £ = 1 case, the eigenvalue of L? is always h?-1-2 = 2h, and similarly the eigenvalue 


of S? is always 3- a Thus, 
page ee ee en oe se 
~ 8%E m2c2 r3 2 \ Fr 4)° 
Since we're working with eigenstates of J? in our coupled basis, this parenthetical term is just j(j + 1) — oan The 
matrix AH is now a diagonal matrix because our new states are orthogonal and eigenstates of J?, which is what we 


AH 


were trying to achieve all along — now we can finally write down the perturbation of energy in our new state, which is 


e 1 oer 1 eae eee 11 e 1 1 > 11 
Mein spasm (f, Mavirmalnes) > (40-2) =| seca a), WY 4) | 


The expectation value of s is known: in general, it turns out that 


(3) 7 1 ie 
rfy, e(@+35) (+1) ap 


e 1 1 
DEO Go eee (=) 
MEG MEE? NTP pei 28 


which is some energy which is on the order of a? compared to the ground state energy of the hydrogen atom, and 


But we'll just set 
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now ia 

AE; m= AEo (1 +1)- 7) 
This means that we started off with six degenerate states at this level 2 = 1,n = 2. Four of them have the same 
energy after splitting: plugging in y = 3, we find that those four states go up by AEp. THe other two states have 
a different energy after splitting: plugging inj = —3, those two states will go down by 2AFpo. And we've solved an 


interesting problem without needing to write down the complicated six-by-six matrix! In general, we'll figure out how 


to figure out the right hand sides of tensor product equations like 


ie eS a 
o-oo 


and if we care about energy splittings, that’s all we will need to know. 


38 April 27, 2020 


We'll be spending most of the time in the rest of this class on addition of angular momentum, but today we'll mostly 
focus on the upcoming test for Wednesday. (As a housekeeping reminder, we have just one more homework assignment 
— it’s due next Friday. The general pace of this class will go down in the last two weeks because of this exam.) 
Problems on uncertainty and compatible observables will be the first explicitly covered topic on this exam, and 
questions up to problem set 7 are fair game (so up to angular momentum). We'll spend this recitation doing some 


practice problems. 


Problem 279 


What are the traces of J, and J? for the j = 3 multiplet? 


We know that there are four different allowed values of m for this value of /, meaning there are four basis vectors 


that span the multiplet: they are |j = 3,m) for m = 3,3,-3,—3. Thus, the matrices J, and J? are both 4 x 4 


2m 
51m), 


We know that all states here are eigenvectors of J? with eigenvalue f2j(j + 1) = 15if (so the matrix is diagonal 


matrices, meaning the trace of any operator A looks like 


tr(A) = > G.m 


m 


A 


where we sum over the values of m above. 


with all diagonal entries am), so the trace for J? will be 4- 29% =| 15h2|. To find the trace for J,, we can write it as 


4 
At and find the traces of the raising and lowering operators: since no state is left invariant, both of those traces 


are zero, so the answer is just | 0| for Jy. 


Alternatively, remember that J, is the operator 


3h 0 0 0 
0 £2 0 0 
0 0 -£ Oo 
0 0 0 -# 


(the eigenvalues of J, are fim, so we have a diagonal matrix). And the trace of this matrix is 0, so we should 


expect that the trace of J, is also zero. Indeed, we can write this as a commutator, which always has trace zero for 


191 


finite-dimensional vector spaces: 


Coe (Gl 13) = (ls _ 1.4)) =0. 


Indeed, this means that any angular momentum operator will always have zero trace. 


One thing to keep in mind is that the identity operator 
i 

/ — = x, 9) 

iq Pl 


does have infinite trace, even though it is written as a commutator. But this is just because we're working with infinite 


dimensional vector spaces, where we have to be more careful. 


Problem 280 
Define (for any y € R) 


S(y) = exp (-3(atat — 44) i 


Calculate f(y) = St(-y)4S(7) in terms of at and 4. 


We know that S(7¥) is a unitary operator, because —2(4' 4! — 44 (the expression inside the exponential) is anti- 
Hermitian. (One notable thing that we can verify is that (e“)* = el’) .) That means that one way we can calculate 


this is to commute the commutator in 
St(7)48(y) = S'(y)S(7)4 + ST(y)14, S(Y)], 


or to use the formula e*Be~“ = B + [A, B] + 5A. IA, B]] + $A. IA. [A, B]]] +--+. But another way is to consider 


the derivative 


d as" een). 
ae i aoty) + S'ya yo 


bringing down the terms in the chain rule (by putting them next to the middle term) yield the commutator 
t Lat at aay 4 
= Si(y) |=5 (alt — aa), a] $(9), 
and this commutator turns out to be 4. And taking a second derivative yields 
ar 


Fy = SN AS(A) = F. 


which means that f = Acoshy + Bsinhy for some A, B. Using the initial conditions yields | f = coshy4 — sinh ya’ |. 


Problem 281 


What is (G|H|a), where a,@ are coherent states and H is the (one-dimensional) simple harmonic oscillator 


Hamiltonian? 


We can rewrite the Hamiltonian so this expression becomes 
wee 
Blhw ( ata+ 5 Ie): 


= hw (ala'a\a) + : (ola) 


This can then be rewritten as 
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and now remember that |@) is an eigenvector for 4, so we can let the 4 act on the |a) and the |a)' act on the (| 


(remembering the complex conjugate): 
- 1 
hw ((S16"ala) + 5 (Bla) ) 
And now all that’s left is to compute the bra-ket (G]a), and that’s given in our formula sheet: it’s 


eo 31-8)? +ilm(B*a). 


Here's one final problem to think about on our own: 


Problem 282 


Alice and Bob share an entangled pair of particles in the singlet state. Suppose Bob has a cloning machine: how 


can Alice and Bob use this to communicate a yes-no message without sending any information? 


(This gives us a method of instantaneous communication, which is not physically allowed — that’s why we have the 
no cloning theorem.) Basically, just have Alice measure along x if the answer is “Yes” and along z if the answer is “No.” 
Afterward, Bob can clone many copies of his (now edited) state, and try measuring along the x and z-directions: only 


one of these will always yield the same answer. 


39 Addition of Angular Momentum, Part 2 


We'll begin discussing addition of angular momentum more generally now, and we'll start with the most important and 


simplest example: 


Example 283 


Consider the addition of angular momentum for two spin 1/2 particles. 


As always, we label angular momentum states with two labels — we'll use |s, m) here (because we have spin and 


not orbital angular momentum here), and we have the operator S? (analogous to J*) such that 
S*|s,m) = f’s(s + 1)|s,m), 
and also the operator S, (analogous to J;) 
S;|s,m) =him|s, m). 


So |s, m) are the simultaneous eigenstates for S? and S, fora single particle. When we want to introduce two particles, 
we'll end up with two vector spaces V4, V2 and two sets of spin operators S@) and S@): thus, we need to write things 
in the tensor product formalism here. 


In the case with spin 1/2, the particles’ individual states can be written with basis 


When we take the tensor product, we'll now have four basis states (pick one of the states for particle 1 and one of 


the states for particle 2), and in this space we'll have the total angular momentum operator 
§ = 5 @1+1@ 5. 
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We'll organize our states with respect to this tensor product: we know that 
11 2 11 
ie er 2 2f 5 
Le ed " 1 11 1 1 
20 2h 4 22245 \We 2a a) 2s 


have one particle spin up and one particle spin down, and 


have both particles spin up, 


have both particles spin down. We'll want to look at our states in terms of total angular momentum in the z- 
direction, which we find by adding up the angular momentum of the first and second particles: since the z-component 


for a single particle |s, m) is fm, we can just add up the contributions for our four states, and we find that 
§,= SM @14+1e@ 52 


will be fi for the first state, 0 for the two middle states, and —fi for the last two states. 

Our next step is to rearrange this uncoupled basis, much like we did last lecture, so that we can have eigenstates 
for the total angular momentum as well. Since one of our states has m= 1, we need an s = 1 multiplet here (which 
gives us m = 1,0, —1), and then we'll also need an s = 0 multiplet (which is the only value of s that yields a singlet). 
So our total space 

VY @ Vo =(s=1) 6(s=0), 


(s=5)@(s=3)=G=ne=9} 


So now let’s go ahead and find these new basis states |s, m), where we're now labeling our states by total angular 


which means we can write 


£. 


momentum. Remember the formula for our raising and lowering operators for a general angular momentum J: 


Js [, m) = hy/j§i +1) — m(mt ly,m+1). 


11 ce ae ee oe | Pe a 
1.|5.3) ae at 5) n> 5): 


so the lowering operator acts in a simple way on the top state for J,. We can also find that (now looking at the top 


For example, 


state for / = 1) 


J_|1,1) = AV/1-2—T- 01,0) = AV2 11,0). 


So now we'll specialize to the identification for spin 1/2: we're looking for a multiplet of total spin s = 1, and this 


must contain the top state (the only state with m = 1) 


11 11 
=1 =) = p 
|s Mm ) 3), 253), 


Similarly, we know that we must have 


194 


because this is the only state with m = —1. So we need to figure out which state corresponds to |1,0) and which 


corresponds to |0,0), and we just need to use the lowering operator on |1, 1): 


Lod 1 1 
b=um=n=8-([5),2[5-3),): 


and because the total lowering operator can be written as S_ = Ss) @2/1+1®@ Se). we have 


11 deed 11 1 1 
h 2 anf —| =]-,- 23 | = ; : : 
Valea tm=)=l53),05-[p-3),*©[p2),° a2), 


and evaluating the lowering operators on the individual vector spaces with our usual formula yields 


allt 2\ e[t 2\ [2 _-2\ elt? 
Oa OP fn Ns Olen es 


So moving the constants around, we find that 


pois) oD, aD), 


and now we have the full s = 1 multiplet. To finish identifying the s = 0 singlet, we need to take some linear 


combination of the m = 0 states which is orthogonal to the |1,0) state (this is because they're eigenstates of a 
Hermitian operator S? with different eigenvalues). We can achieve this by switching the sign of one of the terms 


above, and we'll take 


1 


|s =0,m = 0) = a 5).8 iF i: zal> 5), > 5), 


to get the singlet state (remember that this actually made an appearance when we talked about the Bell inequality!). 


Indeed, looking at these four basis vectors, notice that they are all already normalized, and for example we can verify 
that (dropping the @ for notational convenience) 


, 01,0) = (/2,2| (2,2 Lf DN fd, LN ee loa \e (EL Sie ut 20 SN Nee 
ETE? INN DEO Dt Dil aN OP Ble N OD. h oN O07 BO Oils A/a O Dan, | Beary 


just simplifies to 0. And there are many other tests we can do on these states: for instance, if we act with the raising 


operator or lowering operator on |0,0), the state will be killed. 
So we have solved our problem, but the notation is a bit complicated. Instead, we'll denote [5. 3) = |t) and 


s, —}\)= = |{), which will make our states look much cleaner: 


[1,1) =|t)1 @lt)2 =I } 


1,0) = a (M1 @ We +1 @ M2) = a5 (ity + Lt) | 
[1,-1) =H)1 @W)2 = WW) } 
eN — tL — 
0,0) = allt 1 @lY2-W)1 @IM2) = 7a IM) It) |} 


(The first arrow will tell us about particle 1, and the second arrow about particle 2 — we've simplified the tensor product 
notation.) And now we're done with this problem: we've figured out what kind of angular momenta occur when we 


add two spin 1/2 particles. 
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Example 284 


An application of the problem we just considered is the hyperfine splitting in the ground state of the hydrogen 


atom. 


We discussed some of these ideas in the previous lecture as well: physically, this happens because the electron can 
have a spin up or down, which means that the ground state |n = 1,2 = 0, m = 0) is actually (double-fold) degenerate. 
If we also consider the proton’s spin as well, our hydrogen atom now has two spin 1/2 particles. It turns out that 
what's relevant is the interaction between the two spins: the proton’s magnetic dipole creates a magnetic field for 
the electron. 

We'll see through this calculation that some subtle complications will come up when we try to look at the spin 1/2 


addition. First of all, remember that the important quantities are the the proton’s magnetic dipole 


e@ = 
tg ey 
Lp = Qp 2mp Pp 
as well as the electron’s magnetic dipole 
= €e2 
Me = — 7 Se- 


e 


Remember that ge = 2 cancels out with the 2 in the denominator, and this g factor tells us what we need to multiply 
the classical dipole value by to get the quantum value. Because the proton is composed of different parts, it has a weird 
Jp constant, which is about 5.59. (Even the neutron, which is supposed to be neutral, has a nonzero gp, because the 
charge isn't symmetrically distributed between its three quarks.) So now we care about the new perturbed Hamiltonian 


with respect to the reduced-mass nucleus: 


AH = —jle- By = 


Oo, Ss 
—S.- Bp, 
me ° ? 
where Bp is the magnetic field due to the proton at the electron’s current position. We know that this magnetic field 
due to a dipole usually has a s dependence, but one thing we may learn in a (later) electromagnetism class is that we 
need to add a delta function at r = 0. (Intuitively, we can produce a dipole by rotating a sphere of charge and taking 


that radius to 0.) So the point is that we'll have 
- 1 2 
B,(F) = 7a (usual dot product term)] +  p8(%), 


where Zo is the SI permeability of the vacuum. 
Remember from last class that the Feynman-Hellmann theorem (in perturbation theory) tells us to look at the 
expectation value of this extra term AH to see how the energy shifts from the ground state. Because there are four 


generate states (due to up/down spins of the proton and electron), we'll need to diagonalize the matrix of AH matrix 


elements. 
We'll start with the four states 
tt 
tH 
W1,0,0 ® ‘ 
tt 
aay 


where w1,0,9 is the spatial (radial) wave function for the ground state, and we need to figure out how to find eigenstates 
of our Hamiltonian that we've introduced. All of our states have the same spatial part, so the spin part is what we 


need to worry about when we diagonalize. 
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In principle, we should just evaluate (/|AH|/) in all states /, j, and it turns out the first 4 (dot product) term vanishes 
whenever £ = 0, so we can ignore it — only the delta function is relevant here. So plugging in our magnetic field, the 


relevant term that contributes to energy differences is 


If we try to evaluate this expectation value in any state w, we find that because nothing else besides w, w*, and our 


delta function have spatial dependence, we'll end up with 


2 
(PIAH|py = ePeK laps 0,0(0)|2 Se» Sp 


3MeMp 


(the delta function means we just evaluate w* from the bra and wW on the ket at zero), and we can further simplify by 
plugging in the wave function for the ground state of hydrogen: we end up with 


els -s 
= eas iS. 
31 MeMp 2G 


Our next task will be to deal with this product of spin operators, and the main identity we care about is 
ee ee 2 2 
Se Sp= 5(S =S6 = 55): 


(We should remember that there are secretly tensor products here: for example, Se . Sp is really |sumSei & Spi-) And 
we know the eigenvalues of S2 and Ss (because these are just ordinary spin 1/2 things — we'll have h?- $ : 3 — 2n, no 
matter what state we're in). So again, this issue is the addition of angular momentum term S*. We want to find 
states that are diagonal for the Hamiltonian contribution AH, so we need to find the eigenstates for the total angular 
momentum operator S°. 

This means that we must turn our attention back to the triplet and singlet state that we found earlier in this 


lecture. These states will have eigenvalue for Se - So of (plugging in the values we already know) 


1,2 s° = 3 
2 he 2)- 
Thus, the states in the triplet, corresponding to s = 1, will have eigenvalue for S? of 2h? and therefore a total 
eigenvalue ua Meanwhile, the state in the singlet will have eigenvalue — 3 And because these are eigenstates for 
AH, we now know our energy shifts: 
Se: Sp 
he! 


where AE is the difference in energy between the top and bottom splittings (because some states go up by zAE, 


AH=AE.:- 


and the other goes down by —3AE, so the difference is indeed AF). And now we know the answer we're looking for: 


the triplet with total angular momentum s = 1 goes up in energy by a and the singlet goes down in energy by SOE 


We can plug in all of the constants now: 


ae Jploe? hi? 
3% MeMp a8 
and we can simplify this by using the fact that aj = ee in SI units, as well as WoEq = s, and everything simplifies 


- Ag h* 1 
ne = teem 1 
3mpmsc? ag 


This is still not easy to understand, and plugging numbers in won't really tell us much. Instead, the point is to introduce 
he 


me and we'll 
e: 


the fine structure constant a = €- © 3 in Gaussian units: then af (ao, but in Gaussian units) is just 


197 


end up with 
4 =m 


AE = — 9, a (met*): 
39° Mp (mec*) 
We already know the units work out: mec? has units of energy (in fact, it is the rest mass of the electron) and 


everything else is unitless. To understand why this is small, note that 
2 2 
a*(mec”) ~ Bohr energy 


(the 13.6 eV constant), and then multiplying by another a? ~ xat00 gives us the spin-orbit coupling energy, which 
iS again much smaller. Then including the ratio of masses makes this AF smaller still: now we can plug in all the 
numbers, and we end up with 

AE =5.88 x 10°® ev. 


To understand the significance of this, suppose that a photon transitions between this splitting of energy levels — we'll 


get an emitted wavelength of 
Cc Cc 2nfic  2n-197 MeV- fm 


Mp RETR BEC RRR IO OR 


where 1 fm is 10~!% centimeters, and then simplifying out the units yields an answer around 21.1 centimeters — this 
means that measuring the decay from this hyperfine splitting will give us a hyperfine splitting line for a wave around 
1420 MHz. 

But it turns out that the probability of the hydrogen decay is extremely small: the lifetime is about 3 x 101° 
seconds, which is about 10 million years. And this phenomenon has useful applications: it helps us measure how fast 
galaxies are rotating (by looking at how the line moves), and the line is extremely sharp because of the uncertainty 
arguments we made earlier in the class. 

Let’s move on: we'll now try to make more general statements about adding angular momentum, and we'll develop 
a systematic way of discussing this kind of problem. Suppose we consider the vector space of two-state systems 


containing all possible angular momenta: write the individual vector spaces as 
Hy = Du, H2 = Du, 
A Jo 


and our goal will be to construct tensor products between H; and H2. Remember that each value of / corresponds to 


a subspace which we can also write as a direct sum (it’s some j multiplet): 


HY =@li.m). HS? =Qle.m). 


We know that we already have angular momentum operators J;, Jy on the two spaces, and we're going to tensor some 


states in H; and He together. For a fixed j;, /2, consider the tensor product space 
Vie = Hy? @ Hy”, 


which means we are considering a spin from each vector space. Since J;, Jo are fixed here, we can define the “sum of 


angular momentum” operator. To understand how it acts on the vector space Vj, ;,, we use a basis defined via 
Ui1,J2, M1, M2) = |f1, M1) ® Yo, me) 


(we keep the notation with j1, jo just to remind ourselves what space we're looking at). 
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Definition 285 


The uncoupled basis for a vector space Vj, ;, is the set of vectors |j1, j2, M1, M2) (where m, and mp range over 


all allowed values). 


Because there are 2/; + 1 possible values for m, and 2/5 + 1 possible values for ms, we have that 
dim Va jp = (2/1 + 1) (22 + 1). 


These uncoupled basis states are relevant because they tell us the eigenvectors of operators for the individual states 
V, and Vs: after all, associated with these uncoupled states, we have a complete set of commuting observables, which 
includes J?, J Jyz and Joz. But we want to recognize our states in terms of the total angular momentum instead, so 
we need to reconstruct our basis in general to have eigenvectors of the total angular momentum operators. Now we'll 


define our total angular momentum operator 
JF= Jy) @14+18 Joy, 


which acts on Vj, and we'll try to construct a new orthonormal basis consisting of eigenvalues of a new set of 


commuting observables. That set of observables will be 
1 ies ae re 


We first check that these indeed commute with each other — indeed, J commutes with everything in the first vector 
space, and it doesn’t need to interact with anything in the second vector space. Since J? and J; are built from J, and 
Jos, we indeed show that a commutes with everything, and we can continue this logic for the other observables. 

Then we can check that we can’t add other commuting observables either — for example, J;2 won't commute with 
J?. But ultimately, what this set of commuting observables allows us to do is to label our coupled basis states with 
the indices 


la, Jo,J,M) . 


(Here, j; corresponds to the eigenvalue h7/;(j, + 1) of JF, Jo corresponds to that of de, and similarly ; and m tell 
us the eigenvalues of J* and Jz.) So what we're claiming physically is that the total angular momentum operator 
keeps us inside the state space Vj, ;,, and we can in fact break up the space into (a direct sum of) subspaces, each 
of which corresponds to a specific representation of total angular momentum. 

But we do need to figure out how to find the possible values of / and m (that is, which ones appear for a given 
Ji and jo), and we also need to understand how we get a given coupled state with some j and m from our uncoupled 


states (which have some j; and m,). This means we need to understand how to start with the completeness relation 
So Ladies, m2) (it.Joi Mm, me) = | 
my1,M2 


(which says that our uncoupled basis states span the space) and turn it into a statement about our coupled basis 


states by multiplying both sides by |j, /2,/,m), which gives us 


So La dei mm, M2) (it. Joi Mm, Malin Jad, M) = Lond, m). 


m1,m2 


But now this bra-ket term is just a number, so figuring out how to evaluate it will tell us how to get the coupled basis 


states as a linear combination of the uncoupled basis states! That's what we've been doing out explicitly in the past 
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few examples, and now we want to speak in more generality: 


Definition 286 


The numbers 


V1, Joi M1, Melfr, Jo, J, M) 


are called the Clebsch-Gordon coefficients. 


We've already seen a few examples of how to find these coefficients — it involves using raising and lowering 
operators to get a recursive formula, and there isn't really a simple way of finding the values without going through 
that computation. So there are actually tables for this, and we'll also get some practice for doing everything out 
ourselves, but we'll focus now on the question of when these coefficients are zero and which /s appear in this 


addition of angular momentum. 


Proposition 287 


Whenever m 4 my, + Mo, 


V1, Jo, M1, Melfi, Jo,J,m) = 0. 


In other words, we only get a contribution from a state if the angular momenta in the z-direction add up properly. 
Proof. We know that 
Ua. J2, IM, Mo|Jz|f1,Jo,J,M) = VY. J2, M1, M2|Jtz + Jaz|1.J2J,m), 


and because we know that our states are eigenvalues of the relevant operators we've included in this equation, we can 


replace everything with its eigenvalue: 


V1. J2,M1, M2|AmM|jfr, Jo,J,M) = Vr. J2, M1, M2|A(m, + me) Lf, J2.J,M) 


(where we've had the Ji, + Joz act on the bra vector and used the fact that all eigenvalues are real). Therefore, we 


can move everything to one side to get 


h(m — my, — me) (Yr, J2,M, Me|/1,Jo,J,m) = 0, 


which is exactly the result we want (at least one of the terms in the product must be zero). 


In other words, the quantum number m is easy to deal with, and now we'll move on to understanding which / values 


appear. 


Proposition 288 


The values of / that come in the addition of angular momentum are 


la —p|) SJ SA +, 


where we go down by 1 each time starting from J, + Jo. 


This can be thought of in a “triangle inequality” way: the largest possible / value we can get is if j; and Jo line up, 


and the smallest is if they point in opposite directions. Another way to write this is that 


Jy @ Jn = (4, + Jo) @ (Jp + Jo — 1) @- ++ @ (J — DI). 
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We can check that this is consistent with the simple cases we already have, and also it is nice that the dimensions 
actually match up: the dimension on the left side is (2/; + 1)(2j2 +1), and if we add up the dimensions on the right 


side, we will also end up with that same constant. We'll check that fact: 


Proposition 289 


For half-integers j1, /2 > 0, we have 


(2p 2s 20 a a) ee Ai do) 


Proof. Without loss of generality assume j; > Jo (relabel otherwise). The right side is an arithmetic sequence with 


average 2/; + 1 and a total of 2/2 + 1 terms, and that yields the result. 


We'll finish by explaining pictorally how our uncoupled states |/1, jo, 11, M2) break down into the j-multiplets. First 
of all, if we take j; > Jo, we will draw each of the groups with a fixed mp in its own column, with vertical height 
corresponding to the value of m= m, + My. Then there are 2/2 + 1 total columns, each with 2s; + 1 different values 
of m,, and they're arranged in the following kind of pattern: 

1 state, m= Jf, + Jo e 
2 states, M= Jf, +jfo—1 ee 


Qo +1 states, m= ji, — jo ee ee 


2jo +1 states, m=—(Ui—Jo) @ e @ e 


: eee 
2 states, M= —fy —jfot1 ee 
1 state, m= —J, — Jo ° 


Notice that the number of states goes up to 2/. + 1, stays there for a while, and then goes back down. But now 
we know that j-multiplets are groups of these states from top to bottom, and (for example) the topmost state must 


be in a multiplet with / = /; + jo. Rearrange as shown: 


1 state, m= Jf, + Jo e 
2 states, M= Jf, +Jo—1 ee 


Qo +1 states, m= jf, — jo ee @ 


Qo +1 states, m=—(U,—Jo) @ @ © 


2 states, m= —/, —jo +1 ee 
1 state, m= —J, — Jo ° 


And now the vertical lines correspond exactly to the j-multiplets that we want! The left-most has states ranging 
from m= —(j, + Jo) to (1 +J2), Sof = J, +Jo, and the right-most has states ranging from m = —(j; —Jo) to Ui — Jo), 
so it has / = J; — Jo. And because we've verified that the dimensions add up, this is indeed the correct set of j-values 


to use. 
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40 May 4, 2020 


The distribution of grades on the test was pretty broad (average around 70, standard deviation around 20). We're 
nearing the end of the class now — addition of angular momentum is our final main topic. (We'll still discuss density 
matrices in the last lecture, which is a new addition to the course — they used to be in 8.06, but they've disappeared 
from the cUrriculum over the years.) There will be one last assignment for this Friday, and then we will have our final 


exam. 


Fact 290 


We don’t cover path integrals in 8.05 or 8.06, because they don’t help very much in more elementary study and 


require lots of work to find use in quantum field theory. But we can write our term paper on them when we get 
to 8.06. 


We'll start today by discussing the basics of angular momentum. We can start by thinking about the hydrogen 


atom 
Pe @ 
~ om oF’ 
where ap = is the Bohr radius, and E = -£4 is the ground state energy (where n = £4 v is the principal 


quantum number). If we let [ be the orbital angular momentum operators, then we have 
bial 0 


because the Ls “generate rotations,” the - always commutes (it’s a vector under rotations), and then the -£ isa 


central potential. Meanwhile, with the new theory we've been discussing 
[H, S;] = 0, 


where S;s are the electron’s spin operators, because the H affects the spatial component of the wavefunction, not 
the spin — therefore, the two operators live in different tensor product spaces. One way we can write this is that 
‘i (< — =) ® lox2, but we can also suppress the tensor product symbol Itself. 

So we want to come up with a complete set of commuting observables (to give us freedom in diagonalizing multiple 
operators at once). We always want energy eigenstates, so we always want to include H, and we want L? and L, 
because introducing L, allows us to label our states with values of m, and introducing L? allows us to label with values 


of £ (so that we can get a state |¢, m)). And if we have a spin for the electron, we also need to introduce $? and S,. 


It might seem like S? is trivial or that it isn’t useful — any spin 1/2 state has eigenvalue m mn nm = sas but the 


purpose is to start developing a system for adding together angular momentum, and we don't quite need to label all 
of our states with a value of s (since it’s always 4 in this case). So we just label our states in the hydrogen atom + 
spin system with 


|n, 2,1, Mm.) , 


which fully characterize our new system. 

We know that the 2 = 0 states start with n = 1, the 2 = 1 states start with n = 2, and so on. But remember that 
we have multiplets for each (n, 2). Whenever £ = 0, there are two states (up and down for the spin). Then whenever 
£ = 1, we have six states (three possibilities for m, two possibilities for m,), and whenever £ = 2, there are ten states. 
And each of these states can be thought of as eigenstates for our operators, but interpreting what these quantum 


numbers n, 2, m mean also has significance in chemistry. 
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To analyze this system more, note that we're using uncoupled basis states — the spin and orbital angular momentum 
are not being related to each other, and in fact nothing Is really talking to the m,. But there is in fact a correction we 
should be making to the Hamiltonian, related to the fine structure, which comes up because of relativistic movement: 
as we saw in lecture, spin-orbit coupling means that some of our states in each multiplet (n, 2) move up or down in 
energy and split apart. 

To study this more complicated system, we need to be more careful about how which operators actually commute. 


We now have a new (fine-structure) Hamiltonian 
Hr =H+H¢s., 


where we've introduced a S- L term, and we now care about whether our operators still commute. [Hr, L?| = 0 
works out, because L2 commutes with everything made up of Ss and Ls, and similarly [Hr, S?] = 0. But we should 


remember that there are secretly tensor products everywhere: 
5-L=S,@1Ly+S,@L, +5, Lz, 


and now we know that we do not have [H7, Lal = 0: even though f, commutes with the original Hamiltonian, the 
different L; operators don’t commute. And similarly [H7, S.] # 0, so our list of commuting observables only contains 
{H7, L?, S*} right now — we need to more to properly characterize our states. 


So here’s where addition of angular momentum comes in: we construct the operator 
P=(L+syP =17 457491 -S. 


We can now check whether our new Hamiltonian commutes with J; = L; + S;. The original Hamiltonian works with 


all of these operators, and now we just need to check whether 


? 


[L+S, L;+S;] =0. 
But we can solve for the dot product: we know that 2S - E = J? — L2 — S2, so it's equivalent to ask whether 


[Palas J) =0, 


L? and S* commnute with anything here, and J? commutes with J; by the algebra of angular momentum. So this 


does indeed work out, and now we can expand our set of commuting operators to 
[AA SS, del: 
And now these five operators mean that we can label our states via 
|n, £,j, m;) 


(again we supress s = 5 because it’s the same for all states). These are the labels for the observables relevant to 
our perturbed Hamiltonian, and now we have coupled basis states: we keep 2, but we replace m, mg (the individual 
azimuthal quantum numbers) with j, mj; (the numbers related to the addition of angular momentum). As we have seen 


in the lectures now, this allows us to make statements like 


where the left side represents (m, m.) representations and the right side represents (j, mj) representations. And then 
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we use the notation (nL;) to denote these subspaces, using the letters S,P,D,--- for 2=0,1,2,--- — for example, 
the ten states at n = 3, £ = 2 organize themselves into a 3Ds/2 group and a 3D3/2 group. And we'll learn in 8.06 that 


the perturbations of energy AE are actually just functions of n and / alone. 


41 Addition of Angular Momentum, Part 3 


We'll start with a review of some important ideas: recall that J; @ Jo means we have some states in an angular 
momentum in a J; = J, multiplet, and we also have some states in a Jo = Jo multiplet, and we have these two 
(commuting) angular momenta act on different particles or different degrees of freedom in a single particle. The key 


identity to remember is that 


Jy ®@ Jn = (Sy t Jo) (1 t Jo 1) D---q (jt Jo|), 


where all representations on the right live in the tensor product space and are multiplets of our new angular momentum 
J = J, +. The basis states on the left form the uncoupled basis, and the basis states on the right form the coupled 


basis. 
Our first goal for this lecture will be to understand the spectrum of the hydrogen atom. Recall that the spectrum 


when we don't care about spin looks like the following: 


Here, the energies at level n are 


2 
1 
Ex = earns 
2a Nn 
where ao is the usual Bohr radius, and for each n we have states for each of = 0,1,2,--- ,(m—1). This means that 


for each level n, we have a total of n? energy states (we can verify this by adding up the states for each 2). Our current 
goal is to understand why we have these n? states in this configuration, and we'll need to return to the Runge-Lenz 
vector to do that. 

Recall that the Hamiltonian and Runge-Lenz vector we are working with are 


—) 2 => 
p e = npn es = Zs r 
SF, RE MLSE RPS = 
2m r' amen? P) r 


Here, recall that Risa constant, unitless vector which points in a fixed direction — classically, that direction is the 
major axis of the ellipse of rotation. Remember that the classical operator is just (px L): the corrections we make 


above account for the fact that L and p don't commute as operators, and also to make sure that we have 


[H, R] = 0. 
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There are a few other useful properties for this vector operator as well: note that we have the identity 
px L=-—L x p+ 2inp, 


and we can plug this into our expression for R (in either direction) to find 


r 1 phe WE 
opel Lx p+ihp) 


“ 


= Le ae fet ss 
R= alex lt fp) 


But we still need to understand this conserved quantity better, and here is where we do something trickier. Since R 


is conserved, so is R2, and doing out the computation yields the conserved quantity 
5 2H = 
R?=1+4+— (Ll? +9’). 
me 


(Indeed, H and [2 are both conserved, so everything checks out here.) To make more progress, we'll need to learn a 
bit more about these operators we're constructing: in order to relate the Runge-Lenz vector to something we already 


know well, let’s try to evaluate R-L. Remember that we already showed earlier in the class that 
r-L=p-L=0 


(this was clear classically but required a bit more symbol pushing in the quantum case), which means that most terms 
in 


ee 1 J 7 
R.[= ax C — inp ae 
(sa(5 inp) “) 


disappear immediately — we're just left with the term that is proportional to (f x L) -L. \f there aren't any identities 


that immediately come to mind, we can just bash this with index notation: we get 
(Ox L) Lj = ey pjlLi. 


From here, it’s tempting to say that k and / are symmetric in the L operators while €;j, is antisymmetric, so everything 
cancels out. The expression is indeed zero, but the explanation is incorrect — remember that L, and L; don’t commute! 


So we need to be more careful: we can write this expression as 
= €jipjleli = pj(L x L),, 
and now remember that we have the commutation relation L x £ = inL, so this then simplifies to 


= p- (inl) =0. 


So all of the terms in the expression R- L vanish, and we're left with | R- LE =0|. This doesn't necessarily imply that 


L-R= 0, but we'll see shortly that this is also true. 


To proceed, recall that we have the important defining property for a vector under rotation 
[Li, vj] = INE ijn VK, 


which can be rewritten in terms of cross products as 


(Ex V+V*x C) = €ijx(LiV_ + VjLK) = Eija (Live — Vel), 


i 


where we've swapped / and k in the second term at the cost of a negative sign in the €;;, symbol. But now this Is just 
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a commutator 


( xV+Vx [) = Eije[Lj, Vk] = INE ijkEjkeVe, 


i 
where we've used the vector under rotations property in the last step. And now we can use the identity when we have 


two such € symbols: we first reorder to get 


= INE jKi€jke Ve = QihdjeVve = 2ihv; 4 


Since the relation holds for all indices /, this actually gives us the identity 


LxV+v7xl=2ihv 


whenever V is a vector under rotations. And we can apply this to our Runge-Lenz vector R — this is because the cross 
product of two vectors under rotations is still a vector under rotations, meaning all three terms of R are vectors under 
rotations. So we know that 
[Ex R+Rx lL =2inR, 
and now this is getting us towards the commutation relations that we want. Writing out the above statement index 
by index yields 
[Lj, Rj] = theijn Re, 


and now we finally know why R- L and L- R are actually equal: whenever we plug in / = j, we have [L;, Rj] = 0, so 


> 


each pair of operators in the dot product LR; +L2R2+L3R3 commute. So we have indeed checked that} L - R= 01. 


But now we'll turn our attention to the last set of commutators we haven't considered, which is [Rj, Rj]. Doing 
this by brute force is difficult — there are lots of terms, and remember that operators like 2 don't commute with p. So 
what we'll do is make an argument to show what the answer can be, and then we'll be left with an easier calculation 


(which we can verify on our own). Essentially, our goal is to compute 


= 


Rx R. 


Here there's no real reason that we should expect R to be an angular momentum — the expression is more complicated 
than that for L. 


Proposition 291 


We know that R x R is a vector, and in fact the components [Rj, Rj] should be proportional to a conserved 


quantity. 


This line of reasoning is interesting: if S; and S> are symmetries, meaning that [$1, H] = [S2, H] = 0, then [S,, So] 
is also a symmetry — that is, [[$1, So], H] = 0. This follows from the Jacobi identity 


[A, [B, C]] + [B. [C, A]] + [C, [A, B]] =0 


for operators A, B,C (where we plug in S;,S2, and H). So we can keep taking commutators to get new conserved 
quantities, and sometimes (when we're lucky) we get all of the conserved quantities in our system. 

So in this system, the conserved vectors are L, R, and L x R, but it’s possible that Rx Ris proportional to some 
linear combination of these. Here we'll use a trick by Schwinger (who also invented the trick for the two-dimensional 
harmonic oscillator): if we do a parity transformation and replace 7 with —7, then the momentum f also changes 
sign (because it is related to the rate of change of F), and L = 7x p stays fixed. This means that R changes sign, 


because one operator in each term of the definition changes sign, so R changes sign as well. 
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But now notice that R x R will not change sign (both Rs pick up a negative sign), so out of the operators that 


could potentially be our conserved quantities, only L is left! So 
Rx RL, 
and we can derive the constants to find that our identity is 


Rx R= in(-25)C 


So now we know all of the relations between L and R, including the commutators and products, and now we want 
to apply this to our hydrogen atom problem. The idea is that we're going to come up with two sets of angular 


momenta, even though R is not an angular momentum. 


Proposition 292 


We'll start by restricting our problem to a specific subspace of degenerate energy: this subspace can have one, 


two, or more states, but we'll analyze this problem at some fixed energy. 


The reason this is a valuable approach is that the operator R2 has an H term, and because H commutes with 
all of our operators here, we can always treat H as a constant (the energy eigenvalue that we're working with in our 
degenerate subspace), and that will make our calculations easier. So from here on, we'll consider the fixed energy 

fe me* 1 
Oh? p2’ 
where v is some arbitrary real number. (We write the energy in this specific way because we know that v will end up 


being an integer, and this will make our algebra easier later on.) So now we have that 


2H 1 
met fi2p2’ 


which means we have the simpler-looking formulas 


To make this look even nicer, we can put an fiv next to each R, which yields 
(AVR) x (AVR) =ihl, fv? R? = f?(v? —1)— L?. 
Writing this in terms of indices and commutators, this also means that we have 
[AVR;, AVR] = theijjnle, 


and we can notice that we can derive these formulas in the same way that we derived the identity L x C = inl. 
And now we're ready to introduce our two angular momenta: we want to take hVUR, which has the right units, and 


add it to something else to get an angular momentum. 


Definition 293 


Define the angular momenta 
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We don't actually know that these are angular momenta yet, but the units match up and we have some hope. (We 
can recover L by adding J; and Jo, and we can recover AVR by subtracting them.) 


First of all, note that J; and J. commute with each other: 
[i, Joi] = sti + AVR;, Lj — fv Ri], 
and now we can expand out the commutator and use the commutators we already know to get 
= i (ihe jnle — hv{Li, Ri] + Av[R;, Lj] — iheijxLx). 


The first and last term cancel, and the middle two terms also cancel out (because [L;, Rj]—[Rj, Lj] = [Li, Rj] +[L;, Ril, 
and now both of these commutators are Rx but with opposite signs in the €;;x symbol). So we have verified that the 
commutator of J; and Jo; is zero, meaning they commute for all /, /. 

Now we need to show that J; and Jp are indeed angular momenta by showing that they form the appropriate 


algebra. It suffices to calculate J, x J, and Jo x Jo: this yields 


(L+ nvR) x (de xc hvR), 


and now this isn’t too bad to work with, because we already have all of our formulas for products of L and R. Plugging 


in the expressions we've derived for each of the terms here, we end up with 


; (inl + (Lx wR+tvRxC)+ inl) = ; (2inl + 2in- nv) 


And indeed, this simplifies to ; 
= in (ot nvk), 
which is either J; or Jo based on the sign we chose at the beginning. So we've indeed shown that we have two 


independent angular momenta in the hydrogen atom! 


And now we're almost done. We know how to write L and R in terms of our angular momenta, so 
C-fvR=0 = (A+4)-(A- 5S) =0. 


Because J; and Jp commute, the cross terms will cancel, and this yields 


7 I = eat Fe | 


£ 


The squares being equal is interesting, and if we square the definition of J;, we get 
an ‘Tiss 2 
Fee (Ll? + mR?) 


(again the cross terms cancel because c-R= 0), and now we can use our expression for fv R? to find that this is 


—_ ere p2/,,2 72) = | Lao 
gil + fir(v* — 1) Lb) =| gr 1) |. 


And now we've actually solved our problem! We've been working with a degenerate energy subspace in which we have 
two angular momenta with equal squares: in such a system, we have angular momentum eigenstates with eigenvalues 
for J? and JS being f7/(j + 1) (where J is a half-integer). Therefore, 


1 os 
fee gre —1)=fr jj +1), 


208 


and solving for v yields 
v=14+4({+1) =(2+1). 


This means that because j takes on one of the values 0, $, 1,---, y must take on one of the values 1,2,3,---, 
respectively! (This means we can write it with the principal quantum number n we've been using in the hydrogen 
atom.) And because n characterized the allowed energies of our hydrogen atom, we have indeed ended up with the 
correct allowed energies, which are me 4 for integer n. Notice that in this problem, we are not using a spin for 
the proton or electron: the angular momenta have just popped out of the representations of our eigenstates. And 
now we can even recover the structure in the picture above for our spectrum: we've invented the degenerate subspace 
consisting of vectors 


li, m1) @ |, me), 


where we use the same j/ for both operators J; and J» because their squares are equal. So this is just the tensor 


product of a j-multiplet with another j-multiplet, and we know how those play out: 


JOj=(2/) (2-1) @---@0. 


And the subspaces on the right are of the total angular momentum J, + J2, which is exactly that of our ordinary 
angular momentum L! So everything falls into place: we have that n = 2/ + 1, and the energy eigenspace at energy 


level n has states with 2= 0 up to 2/ = n— 1, which Is exactly what we want. 
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We've now seen how we can use the Runge-Lenz vector to construct angular momenta, which helps us predict the 
structure of every level of the hydrogen atom. Remember that for each principal quantum level n, we have states that 
run from £=0 to 2= n—1, and the Runge-Lenz vector actually helps us move across the various £-multiplets. 

Remember that this solution is only valid for the original hydrogen Hamiltonian — the degeneracy is broken once 
we add the fine structure from the spin-orbit coupling. When we add this extra term to the Hamiltonian, all of the 
energy levels adjust according to the total angular momentum. 

We'll have a lecture about density matrices for next week, which we should read. It’s still part of our course, but 
we won't have any homework problems on it, so we'll only get conceptual questions about it on the final. (The final 
will not be completely cumulative — it will focus on the later part of the course.) 


Today, we'll start with a conceptual discussion related to one of the problem set problems: 


Example 294 


Suppose we have a particle X, in the rest frame of the lab, which decays into two particles A and B (for instance, 


a pion 7° decaying into two photons 27). 


We can consider the angular momenta of our particles before and after the decay. Even though X is at rest, it 
may have some spin Sx, and similarly particles A and B have some spin Sy, Sg. We expect that the total angular 


momentum might be conserved: we can define the quantity 


7 Sx t<0 


Se+ Seb t > 0. 


Basically, because our particle X is sitting at the center of our frame of reference, there is no orbital angular momentum 
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at first. But after the decay, it’s possible the two particles also have some orbital angular momentum (like with the 
hydrogen proton and electron) in addition to the spin. But we can put that into a single quantity [, much like we 
did with the spin-orbit coupling problem. (And the main thing to keep in mind is that orbital angular momentum isn’t 
always conserved — only the total angular momentum!) 


Let's suppose that we are in some initial eigenstate 


|Sx,Ms,), 


so that J? has eigenvalue f2Sx(Sx +1) and J; has eigenvalue fims,. Then these eigenvalues must be the same after 


the decay as well. 


Problem 295 


Suppose someone claims a neutron decays into a proton and an electron. 


The first thing we can check is energy conservation — it is indeed possible, because the neutron is slightly heavier 
than the proton. Momentum conservation also looks possible — we just need to pick the velocities of our particles 
accordingly. And charge conservation holds too, so everything here looks like it is consistent so far. 

But it turns out this can’t actually happen: we know that the neutron, proton, and electron are all spin 1/2 particles, 
meaning they can be in the states |, 5) or 5, —%). So the total angular momentum after the decay, So+ Se +L, 
must be the same as the total angular momentum before the decay. A typical state of the decay product will look like 

1 1 
B mp) a) BH me) @ |2, m) 
where Mp, Me are the azimuthal quantum numbers for the proton and electron, respectively, and £, m characterize the 


orbital angular momentum. More abstractly, our states live in the tensor product space 
seas D 
~8- @8. 
2° 2 


We can simplify this product: the tensor product is associative, so we can look at 4 ® $ first — it evaluates to 160 
— so we end up with 
=(190) @2=(1@2)G (082) 


(where we've now used distributivity), and this finally evaluates to 
=(€+1)6(€-1) 62. 


But we need to get a state of total angular momentum 5, and no matter what the value of @ is, the angular momenta 


will always be integers after the decay! So conservation of angular momentum doesn't work, and this is not possible. 


Fact 296 


Physicists initially thought that such a decay was observed, but we can add a particle called an antineutrino to 


the products, and now the decay works: our tensor product space becomes 
: ® : ® : @e 
a ae? ’ 


which does have fractional angular momenta. 


Remark 297. We may hear about “highly forbidden processes,” which can only happen from the action of a highly 
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oppressed operator. In those cases, the process may still occur very rarely, but in this case we're saying this particular 


process will never happen. 


Problem 298 


Suppose we have three spin 1/2 particles in the Hamiltonian 


ae ee Oe 
H= = (51-52 + $2: 534 53-51), 


What are the explicit energy eigenstates? 


First of all, we know the dimensionality of our state space is 2 x 2x 2 = 8. To get the highest contribution to this 


Hamiltonian, we should have all of our spins point in the same direction, meaning that |+ +-+) and | ) should 


have the highest energy eigenvalue. We can see, for example, that the operator S,-S> acting on |+ ++) yields 


Sie os 1 1 
S1-So|+++)= (55iSe +551 S24 + 51252] ras 


but now the first two terms both kill the state (because we can’t raise the |+) state, so only the last term contributes 
and gives us something proportional to |+ ++). (And the same thing occur with |— — —) — the product of $1,759, 
still yields a positive eigenvalue). 

In order to understand the energy levels of this 8-dimensional vector space, we can define a total angular momentum 
Sr a 51 + So + $3 and rewrite the Hamiltonian as 


Al 


HS a! 


$2 — 575? = 52). 


(Note that we can do this because the S;, So, S3's x, y, Z operators commute with each other.) But now the eigenvalues 
for Ss re = are each always = so the only thing that the system's energy level depends on is the total spin angular 


momentum s: in particular, our energy eigenvalue is 


B= aps (Wats + 1) 3-39) = 5 (s(s + 1) 7): 


To understand the possible values of s, we just do the calculation 


I 3.1.1 
D0@ > * 


1 1 21 1 1 
58585 = 18085 18s 


which means the representations can be at spin 3 (4 states) or 4 (another 2+2 = 4 states), corresponding to energies 


of aN and —3A, respectively. 


43 Density Matrices 


In this last set of lectures, we'll discuss ensembles and mixed states, and we'll be able to appreciate again that 
probability plays a role in quantum mechanics. Remember that classical mechanics, probability just comes up due to 
a lack of knowledge — if we roll dice, we can theoretically always predict the result if we have enough information. 
But this is not true in quantum mechanics anymore: even with perfect knowledge of our state |), we will still need 
probability every time we measure any observable. 

One way we can measure this probability is to make many copies of our state |w), measure our observable many 


times, and form a probability distribution with enough testing. But now, we are introducing a new source of randomness 
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in our system. 


Definition 299 


Let V be a vector space of states. A pure state is just some |W) € V, while a mixed state introduces some extra 


randomness which we will describe now. 


For a concrete example, consider the following: 


Example 300 


Suppose we have an oven which spits out silver atoms towards a Stern-Gerlach machine which causes deflections 


of those atoms. 


In such a system, every atom behaves like a spin 1/2 particle: we eventually find that they are either in the |/7; +) 
or |; —) state, but before going in the machine, they are polarized in all possible directions 7 (distributed randomly). 
It’s natural to ask whether this randomness is already accounted for — that is, can we write down a state |qw) whose 
intrinsic randomness describes the particles coming out of our atom before they hit the Stern-Gerlach machine? 


We do know that any spin 1/2 state is in a superposition of the up and down states: 
ly) =a, |+)+a-|-), a,,a-€C. 


But we know that specifying these coefficients tells us the angles ¢, @ for the normal vector 77, so this fixes the direction 
of the spin state, rather than picking it from a distribution! So we do need some additional randomness. 

So we'll first consider the simple case where the atoms aren't completely uniformly distributed: instead, each particle 
has a 50 percent chance of being spin up and a 50 percent chance of being spin down, always in the z-direction. We'll 


describe this with an ordered pair 


(Pa |Wa)), 


which means that the particle comes out with the state |w,) with probability p,. So this oven that we've just described 


c={(b)-G4)} 


This is now an example of a mixed state: not all of our particles come out with the same wavefunction even before 


can be written as 


they hit the Stern-Gerlach machine, which means they are not in the same quantum state. In other words, we may 
have an ensemble where we take 2000 copies of this state, where 1000 copies are in the |+) state and the other 1000 
are in the |—). Then when we test a measurement, we work with this enesemble instead. Let’s make this definition 


more generally: 


Definition 301 
An ensemble E is defined by 
{(P1,|¥1)).°++ (Pans [Pn))} 


where the probabilities p; are all positive and sum to 1, which dictate the likelihood of the corresponding normalized 


(but not necessarily orthonormal) states |2;). 


Note that the dimension dimV of the vector space has nothing to do with the number of states n we have in 
our ensemble: we're not trying to form a basis or anything like that. Then n = 1 yields a pure state (the ensemble 


collapses to a single known state |w1) = (1, |#1)), so all particles are in this state), and n > 2 yields a mixed state. 
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So now suppose that we have some Hermitian operator @, and we want to measure the expectation value of Q. 


Then we can let 


(Q)e = Pi (WilQ|vi) . 


since we should take the weighted average of the expectation values for each of the possible states we can have in our 


ensemble. 


Example 302 


In the example ensemble E, above, the expectation value is 


1 


(Oe, = 2 ( 


To make this interesting, suppose that we have another ensemble 


c={(be).Gm)} 


(Half of our particles start off pointing in the +x-direction, and the other half in the —x-direction.) Similarly, we'll 


have 


(Oe. = 5 (x +O): +) +3 Os -[Q)xi-). 


But now we can write the expectation value in the states |x; +) in terms of the expectation in the states |-+) via 


1 
|x; +) = vall+) |p), 
and now we can plug in to find 
A 1 1 mn 11 a 
(Qe, = 5-5 (41 + (DOU) +1) + 5-541 — (O04) = +). 
Now the ( t || } ) and ( || ) terms will combine, but the cross terms will cancel out, and we're left with 
1 x 1 s x 
= 5 (+|@|+) : 54 |Q| s=(Qe 


So the expectation value for any Hermitian operator © always looks the same in both ensembles, even though the 
ensembles are different! So if we try to measure anything at all, there's no way to get a different answer between 
these two ensembles EF, and F,, and thus these are actually indistinguishable from each other quantum mechanically. 

We can also consider an unpolarized ensemble, where the spins all point in various directions. Then we will need 


an infinite (in fact uncountable) list to describe the whole system, but we can still describe this as 


dQ _ 
Ewe = JS Va); +), 
dQ 


where we're adding over all solid angles, and the total solid angle is 47. Similarly, we can consider the ensemble 


co={(Bie+).(e-)} 


for some fixed vector 7. This is analogous to the ensembles E, and E, that we've defined earlier, and we can check 


with a similar argument that both the unpolarized ensemble and £; will be indistinguishable from FE, and E, as 
well. In other words, we can always describe an unpolarized ensemble by choosing the particles to point half-and-half 


in some fixed direction. 
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From here, let’s see another instance where mixed states come up: 


Example 303 


Suppose we have an entangled state of two particles belonging to Alice and Bob, with state 


Waa) = —a(l+)al ae 


This is the usual spin singlet state (which has total spin angular momentum 0, so it’s rotationally invariant). We 
know that if both Alice and Bob have full knowledge and measure along the z-axis, they will measure their particles 


to be in opposite directions. But now suppose that Bob does not know what Alice's measurement is: then Bob's 


Eas ={(3.14)).(3.I)} 


(because if we have many copies of our entangled state, Alice will measure + half the time and — the other half of the 


particle is operationally in a mixed state 


time). In fact, if Alice measures in an arbitrary direction 7, we can use rotational invariance to rewrite our entangled 


state as 


Waa) = alli +) I \e—li—)ali +)p), 


and now if Alice measures along the f-direction and Bob doesn't know what the result is, Bob ends up in the state 


Ean = {(5.1+)) (5.1m) }. 


which we know is physically the same ensemble as the ensemble we initially had. In other words, Alice’s measurement 


axis does not affect the ensemble for Bob, even though it does affect the particles! (And this is the more satisfactory 
explanation for why we don't have instantaneous communication.) 

And we can confirm that there is no way to avoid using mixed states here: suppose that we have some pure state 
|w,) that describes Alice's particle when it is entangled (if we only care about Alice’s particle and not Bob’s). Then 


we would know that the expectation of an operator is given by 


(WalO|Wa) = (Was|Q ® !| Vaz) 
(since we don’t really do anything to Bob's particle). So now if we look at the case where our operator Q is ox, we 
know that we flip |+) and |—) to each other, so we have that 
1 1 
v2 V2 


and now there is no overlap because there is no |+)|+) or |—)|—) in the original singlet state. Similarly, we can 


(Paslox @ !|\Pas) = (tla (-le — (-la (+1e)0-)al-)e — I+) al+)8): 


calculate in the cases where a) = oy and ) = 0, that the expectation is also zero, so any pure state that describes 


Alice's particle in the singlet state must satisfy 


(Walox|Wa) = (Waloy|Wa) = (Waloz|pa) = 0. 


And this isn't possible, because any pure spin state points in some direction, so there is some fi such that 


(Walt o|Wa) A O. 


But then 7-o@ is a linear combination of the o;s, so if each of the individual o;s has expectation value zero, so must 
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n-a, and this is a contradiction. So no pure state represents one particle of an entangled state, and we will 
now need ensembles to describe something like the singlet state — the point now is to introduce a tool that helps us 
describes such systems nicely. One of the ideas is that we want to avoid the issue where different-looking ensembles 
actually correspond to the same situation. 

Recall that an ensemble E = {(p1, |W1)),--- , (Da, [Wn))} corresponds to an expectation value 


(O)e = ye Pa (Wa|Q|Wa) = > Patt (Q |Wa) (al) : 


Here, we use the fact that tr(|u) (v|) = (v|u) (this is a manipulation we did in the past), and we can also use the fact 


that the sum of traces is the trace of the sum for a set of matrices to rewrite this as 


=tr (ox. |Wa) .) ; 


So we have Q, and then we have some operator which is only dependent on our ensemble E. That’s the operator 


we're about to introduce in the general case: 


Definition 304 


A density matrix is a linear operator pe € L(V) associated to an ensemble E via 


PE = iss |Wa) (Wal . 


a=1 


In other words, we're describing our states with matrices instead of vectors, and we get the helpful fact that 
(Q)e = tr(Qpe). 


We can use this to look at our previous ensembles now: our ensemble FE, can now be represented with the operator 


1 1 1 
= 5 /+) (4/4 =>5/, 
pe, = 514) (4145 1-) (l= 5 
because in general summing over an orthonormal basis yields the identity )7; |/) (/| = /. So we also have 
1 1 1 
Pe, = 5 b+) OG +14 5 bs) Oc “1 = 5h, 


because |x;+) and |x;—) form an orthonormal basis as well. We find that pe,,, is also described by this matrix, and 
now we can see that the density matrix is describing our states more powerfully than just using the ensemble — we can 
easily tell when two ensembles are indistinguishable. 


We can now check a few properties: 


* pis a Hermitian operator, because each term in the sum >>, Pa |Wa) (Wal is a Hermitian operator. (Remember 


that the adjoint of |u) (v| is |v) (ul.) Therefore, it can be diagonalized, and it will have real eigenvalues. 


* Oe Is known as a positive semidefinite matrix, which means that all of its eigenvalues are nonnegative. In 
mathematicians’ language, a matrix M is positive semidefinite if (v, Mv) > 0 for all v € V. Therefore, if we take 
an eigenvector v of unit length, (v, Mv) = (v,Av) = A> 0. (As an exercise, we can show that any positive 


semidefinite matrix must be Hermitian.) And this means that for any state wW in our state space V, 


(Wlpe|p) = 0. 
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This is because we can rewrite the above expression as 


-Yop, (Wlva) (bald) = dors (wlPa) 


and because the lengths are nonnegative and pz are probabilities, this expression has all terms at least 0. 


- The trace of pe is 1 for any density matrix. This is because 


tr(pe) =tr (= Pa |Wa) .) a S© patr(|va) (Wal) = S- Pa (Wala) = S- Pa=1, 


because the wzs are defined to be of unit length. 


As already mentioned, the density matrix removes redundancy in ensembles: no matter what combination of 
states we choose in a state space of dimension n, we always end up with a Hermitian n x n matrix, which is 


always specified by n* real constants (minus one if we fix the trace to be 1). 


Phases in the definitions of our states |) do not affect pe, because replacing |) with e’ |qw,) will make the 
ket-bra look like 


e!® |pa) e7'* (hal = |Wa) (bal 
which is identical to what we start with. 


In general, the density matrix is the best way to describe mixed states, and we often call it the state of our system 


or state operator. 


Example 305 


To help us study this object a bit more, let's consider the case where we have a pure state. 


This means we have an ensemble 
E={(1,|p))}. 


and plugging in the definition, we Just have 


pe = |b) (| |. 


Because our state w is normalized, this actually gives us a rank-1 orthogonal projector (of trace 1) into the space 


spanned by the vector |): we can check that p? = p and p' = p. In other words, for a pure state, we have the 


property that 
tr(o*) = tr(p) = 1. 


But in general, the trace of p? won't always be 1 when we have a mixed state: 


Theorem 306 


For any density matrix p, we have tr(p*) < 1, and saturation of this inequality only occurs when we have a pure 


state. 


Proof. We know that 


tr(p*) =tr (= Pa |Wa) (Wal S- Pb \Wp) in = S- PaPb (WalWp) tr(|Wa) (ol) 
a b a,b 
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by pulling out all of the constants from our trace matrix and using linearity. But now the trace of |W) (Wal is (Wolwa), 


which is the complex conjugagte of (wWa|wp,). Thus, substituting this back in yields 


= S- PaPp| (Walp) Fe 


a,b 


Schwarz’s inequality now tells us that 


| (Walp) |? < (Wala) (WolWp) =1, 


and thus our sum simplifies to 
a pp = > be = 1, 
a,b a b 


with saturation only if wz and Wp», are always pointing in the same direction for all overlaps, which only occurs if all 


| (Walp) | = 1. Since our states are normalized, this means our states only differ by a phase which we can ignore. 


Thus, at equality, we can combine all terms and we just have a pure state, as desired. 


Definition 307 
For any density matrix p, define the purity of the state to be 


¢(p) = tr(p”). 


A minimally mixed (or pure) state will then have highest possible purity (1), and a maximally mixed state will be 
one with minimum purity. It turns out this minimum purity is helpful in dealing with unitary time-evolution, since it 


stays constant even though p may not. 


Proposition 308 


The maximally mixed state p is 


which has a purity of =47. 


This should remind us with the characteristic examples from the beginning of this lecture. 


Proof. Because the density matrix p is Hermitian, we can diagonalize it, and we'll work with a basis in which p only 


has diagonal entries 


p = diag(p1.--- , Pn), 


where n= dim V. We know that the p; are nonnegative and sum to 1 (because the trace of p is 1). Then 
n 
p? = diag(pz,--- ,p,) => tr(o?) = So 7, 
i=1 
and now we can minimize this by being clever and using Cauchy-Schwarz. Alternatively, we can consider the function 


N 
L(pi.-++ Pa A) = > pp — A(-14+ dpi), 
i=1 i 


where is a free parameter, and now taking the derivative with respect to A yields our constraint }> pj = 1. So now 
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we can find a minimum by taking the derivative with respect to any p;: 


aL 
SS Dp 
8p; Pi ; 


and since this holds for all p;, all pj must be equal at the minimum, meaning that each p; is equal to 2 So the density 


matrix of lowest purity is the diagonal matrix with entries 7 which is just sty! as desired. And indeed the trace of 


1 de ot Bs al 
mr 'S N° 72 = 5 = dmv: 


p°, which is a diagonal matrix with entries 


In conclusion, ensembles determine density matrices, which are linear operators with certain useful properties. Our 


next step will be to consider certain spin 1/2 density matrices and then understand more properties of the theory. 


44 May 11, 2020 


Our final exam will be next Wednesday — there should be enough time to review, and it’s recommended that we go 
over everything that we’re uncomfortable with in the course. 
Last time, we discussed the Hamiltonian of a system of three spin 1/2 particles — we did half of the work, and we'll 


finish the discussion of that problem today. 


Problem 309 


As a reminder, we were working with the equation 


yee 
H= = (51-52 + 52-534 53-51), 


In this class, we aren't dealing with issues of distinguishability: all particles are distinguishable. 


The main trick here is to use the total angular momentum Sr = Si + S> + S3 (where the secret meaning of the 
right side is a tensor product S$; @/@/+/@ Sol +1@ 53 ® /). Then we can expand out 


 & 
2h? 


53 = SP +53 +53 +2 (51-524 52-534 55-51) H (525? 2 82). 


From here, the idea is to work with basis states that are eigenstates |s, m) of the total angular momentum Sy instead 


of our uncoupled states, meaning that we have 
S2.|s,m) = f?s(s+1)|s,m), S2Z\|s,m) = hm|s, m). 


An important point to keep in mind here is that in a spin 1/2 system, we have equations like 


and we get the same result for Sy and S;, so the squared operators S?2,S2, Se can all be treated as numbers in this 
Hamiltonian. And in general, when we have an angular momentum operator L? acting on an £ multiplet, we know that 
L?|£,m) = h?e(€+ 1) |2,m), so the operator L? can be treated as a number h?£(2+ 1) (times the identity matrix). 
But this does not mean L2, iat L2 are necessarily proportional to the identity — that’s something special to the spin 
1/2 particle. 

So if we're doing an angular momentum problem where we combine states |j,, 1) @ |J/2, m2), all such states (for 
a fixed /;, jo) are eigenstates of both J? and J$. So when we rearrange them in terms of total angular momentum, 


so the operators that we care about are now J? and Jr, all states will be eigenstates of J? and J5 as well. And 


218 


now if we look back at our original three-state problem, but we imagine that we're combining states of the form 
|S1, 1) ® |So, M2) @ |s3, m3) where 51, So, s3 are definite, SF Ss, and . can still be treated as just numbers. 

As another example of this, we can describe states of the hydrogen atom as either (n, 2, m, 5, ms) or (n, £,J, 5, mj): 
the fact that 2 and s are being kept here is noting the fact that L* and S? can be thought of as numbers in both the 
coupled and the uncoupled basis. 


So returning to the problem, we did a calculation last time to show that 


a ee a ee 


(where we have a 2? = 4+ 2+ 2 = 8 dimensional vector space). Then the total energy of our coupled eigenstates 
was calculated last time: it’s 4 (Sr(Sr +1)- 2), since the operators S, of, S are each ame. times the identity, so 


four states go up by 2A and the other four go down by 2A. 


Problem 310 


What are the states of the 3 multiplet for total angular momentum (in terms of the spin 1/2 states)? 


We can condense notation by writing 


11 a 1. 4 6 1 1 14 ) 

a9 2" 2 2” 2 
We know that |+ ++) and | ) both have a total z-component of angular momentum that is larger than 5, Xe) 
both of them must be included in the multiplet (they are the states |j,m) = 3, 3) and 2, —3), respectively). To 


find the others, we can apply the lowering operator J = Jy—~ + Jo_ + Jz on |+ ++), noting that |+) becomes fi |—) 


under a lowering operator to find 


but also 


3 3 3.1 
J_|+4+ +4) -|3.5) mals). 


Setting these equal yields 


3 1 1 
Ss) ale tH +l H+ +) 
(and notice that we also didn’t need to actually keep track of the constants, since we know that |— + +) , |4 +), |+4 
have equal contribution). Similarly, we can raise the |— — —) state to find that 
3 #1 1 
F-5) =< yt te +-) 41-4). 


Remark 311. One important thing to keep in mind is that directly acting with the operators Ji don't produce 


normalized states, because 


Js |j,m) = h/jG +1) — m(m+ 1)|j, m+ 1) 


and we have an extra fi and other constant here. 


Problem 312 
1 


From here, a natural extension is to find the states in each of the / = 5 multiplets. 
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Note here that because both / multiplets have equal /, there is not a unique answer. We can start each multiplet 


by finding a state |5, 3), (the top state of one of the multiplets) which is orthogonal to |3, 5). Such a state is of the 


form a|—++)+ 6/4 +) +y|++-—), where a+6+/ = 0 is the orthogonality condition: for example, we can use 
11 1 
= = Feb, 
E 5), V2 


and then lowering this state yields [5. —3),. Finally, we can find the last m = $ state orthogonal to the first two, 


which is a more complicated calculation: for instance, we can use 


e3),- 6! ptt oa, 


Fact 313 


These last two lectures are not part of the 8.051 class for this semester due to COVID-19, but the notes are still 


included below. (There is also more content in each of these last two lectures.) 


45 Density Matrices: Decoherence 


Now that we've described density matrices generally, we'll do an example to help us discuss some more properties of 


these objects. 


Example 314 


Suppose we have a density matrix for a pure state of a spin 1/2 particle. 


We know that this density matrix must be a projector operator to some state |; +), so it takes the form 


|i7) (ay . 


If we want to write this as a (Hermitian) matrix, we can write it as a superposition of the four basis Hermitian matrices: 


we'll say it takes the form 
1 ee 
= 5 40! + 2 0% 


where the 4 is to make the normalization a bit nicer. Taking the trace of both expressions, we find that 


1 Re 
1 Sgtet pene a0 |, 
[= 


because the Pauli matrices are traceless. In order to find the other coefficients, we can multiply both sides by o, and 


then take the trace: 
= 1 1 
tr(o, |7) (Al) =tr (a + 5 » so] : 
1 


Again, Pauli matrices are traceless, and because ojo, = 6;x/ + (Pauli matrix), the only contribution to the trace comes 


from i = k, meaning 


1 
tr(ox lA) (rit) |= 5 trax! = $ = ae, 


In other words, we can write the coefficients a, in terms of an expectation value: the trace of |a) (b| is just (a|b), so 


plugging in a= o,f and b= (fi, 


ax = (ilox| 7) |. 


And now we can calculate this using the general formula for the spin state pointing in the direction (8,6): we end 
up with n,, the kth component of the normal vector 7. So now we know how to write down the density matrix of a 


general pure state: 


Im) (m= 5 (1+ A 8)| 


To figure out the density matrix for a general mixed state, we can use this result, but first we should make sure we 
understand how to build such a mixed state. We know that we can go from an ensemble to a Hermitian, positive 


semidefinite matrix with trace 1: it turns out a kind of converse Is also true. 


Theorem 315 


Given a Hermitian, unit trace, positive semidefinite matrix M, we can always view it as a density matrix for some 


associated ensemble Ey, such that M = p_,,. 


In other words, we just need to check a few properties to see if an operator is indeed a valid density matrix. 


Proof. Since M is Hermitian and positive semidefinite, it can be diagonalized, and it will have eigenvalues A1,--- , Ay > 


0, such that Ay +---+Ay =1. Let |e;) be the eigenvector with eigenvalue ;, so M |e;) = Ale;). So we can write 
M = dle) (el 


as the diagonal matrix with entries A; in the (/,/) spot, and now this is the exact form of the density matrix for the 


ensemble 


Em = {(P1,|b1)),-- (Pn |Bn))F » 


where p; = A; and |w;) = |e;). (Indeed, the probabilities add to 1 and are nonnegative.) 


This is nice, because it gives us a clean way to describe a general density matrix in any system. 


Example 316 


Now we're ready to construct density matrices for mixed states of a spin 1/2 particle. 


A general mixed state is still supposed to be a Hermitian operator acting on the two-dimensional vector space, so 


we can still write it as 
1 i 
p= 5 40! + 5 > aj0}. 


We can still take the trace of both sides, and because tr() = 1, we still have aj = 1 by the same argument as above, 
meaning ; 
= / 7-6 
p 5 +2-6 


for some unknown components of 2. In order to ensure that this is a valid density matrix, we just need to check the 


last property now, which is that its eigenvalues are all nonnegative. The eigenvalues of ¥-@ are +|a| (because the 
a]. (This is because 


any vector is an eigenvector of the identity matrix /, so in particular the eigenvectors of ¥- & will work.) Therefore, 


eigenvalues of 7-@ are +1 for a unit vector 7), which means the eigenvalues of /+ 4-@ are just 1+ 
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the eigenvalues of ¢ are 


de = 5 (14 (al), 


meaning the necessary condition on our coefficients (for eigenvalues to be nonnegative) is | |a] <1). And now we've 


guaranteed positive semidefiniteness, and therefore any Hermitian matrix with this condition will be a valid density 


matrix: the most general mixed state looks like 


In other words, 4 = (a1, a2, a3) must live inside the closed sphere a + a + a < 1. Notably, when we're on the 
boundary |a| = 1, the density matrix becomes a pure state (as above), and when we take the zero vector for 7, we get 
the maximally mixed state: indeed, we end up with sl, which is exactly the state with lowest purity that we derived 
last time. 


From here, we'll move on and talk about the effect of measurements on density matrices. 


Example 317 


Suppose that we make a measurement along an orthonormal basis {|1) ,---|n)}. 


Remember that if we start with a single state |w), then the probability that we end up in the basis state |/) is 
P(i) =| (ip) |? 


So now suppose we have a mixed state of an ensemble E = {(p1,|W1)),--: , (Pm, |Wm))}. Since we can be in various 


states with different probabilities, we now have to take a weighted average: 
P(i) = S- pal (i|a) Fs 
a 


We should be able to write this as a quantity that only depends on the density matrix, and that’s what we'll work 


towards. Rewriting this expression more explicitly, we have that 
P(i) = S- pa (1|Wa) (Wal!) - 
a 
Since the /s have nothing to do with the sum, we can pull them out of the sum and rewrite as 


= (| $5 pa |Wa) (Wal |!) - 


And now the middle term is just the definition of the density matrix, and we have a nice result: 


P(i) = (ilolt) |. 


But if we want to ask about the density matrix after measurement, notice that we'll end up in one of the states |/) 
with some probability. So our measurement is some operator which sends density matrices to other density matrices! 
Specifically, we know that ending up in the state |/) corresponds to the density matrix E; = |/) (/|, and this is nice to 
work with because ES = F, and E,E; = E; (so we have an orthogonal projector), and | E; = 1. So doing a general 


measurement (where we don't focus on what state we actually end up in) will give us a mixed state 


E = {(P(1),|1)), (P(n), |In))}, 
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meaning that we get a post-measurement density matrix 


B= DUPCLD  = STN PH (i = DOA) ol (iL, 


which we can write in terms of our orthogonal projectors as 


n 
b=) 7 EipE;|. 
i=1 


With this, we're now ready to examine the dynamics of the density matrix: 


Example 318 


Our initial focus will be on unitary time-evolution (similar to the dynamics that we discussed earlier in this class). 


In order to describe this time-evolved p(t), we'll again think about the density matrix in terms of a corresponding 


ensemble. We can start with the Schrodinger equation 


6) i 
aL lw) aid I) 
and then taking the adjoint of both sides yields 
6) i 
= l= 5 1A 


(where nothing happens to H because it is Hermitian). So we can already see what time-evolution looks like for a pure 


state density matrix: by the product rule, 


fe} ! ! / 
se (ID) (Wl) = — SA) CDI + IW) = OL = — 21H, I) (WL 


In other words, the time-evolution can be written in terms of the commutator, and this may look familiar (it looks sort 


of like the Heisenberg equation of motion). But now we can generalize to a mixed state: 


6) fe) 
a = 5p) Pala) (al, 


and now applying the formula we derived for a pure state to each of the terms here yields 


— a S> palH, \Pa) (Wal] 3 


Rearranging and bringing the sum inside the commutator, we can now write everything in terms of the density matrix 
itself: 


in |= 


H,S~ pa |Wa) a) =| [H, p] |. 


This is a clean differential equation, but it will turn out that not all density matrices evolve in this unitary manner (for 
instance, if we just look at a subsystem that is in contact with the rest of the system). We can say a few more things 


about this unitary time evolution, though: in the Schrodinger picture, we know that our wavefunction evolves via 


lp, t) = U(t) |¥, 0), 
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and now if we think of p(t = 0) = > pa |Wa, 0) (Wa, 0|, we can apply the unitary operator to find 
p(t) = S- Pa Wa, t) (Wa, t| = pa paU(t) Wa 0) (Wa, O| Ui(t). 
a a 


Since U(t) and U'(t) are in every term of the sum, this tells us that 


p(t) |= U(t) > pala, 0) (a, 0] UT(t) =] U(t)e(t = O)UT(t) | 


So just like the discussion we had earlier in the class, we can get both a differential equation for time evolution and an 
explicit formula in terms of the unitary operator U(t): since our density matrix has a ket and a bra, we hit it with a U 
from the left (for the ket) and a Ut from the right (for the bra). As a consequence of this, p will remain Hermitian, unit 
trace, and positive semidefinite at all times if it starts off Hermitian, unit trace, and positive semidefinite, respectively 
(all of these can be easily seen by examining the expression for p(t) that we've just derived). 

And we can even take a look at how the purity of our state evolves in time: since ¢ = tr(p?), we know that (trace 


commutes with the derivative) 


dq dp dp \ _ do dp\ _ do 

ai =e (oh + Le =tr Pan t ae = 2tr Par 
by cyclicity of trace, and now we can substitute in the expression we have above for the time-evolution of p to find 
that this is 


2 2 
= —tr(p[H, p]) = —tr(pHp — p?H 
ay tel. el) = = tr(eHp — p°H), 
and again by cyclicity of trace we can turn this into 


y. 
= 7p tt (eHe — pHp) = 0. 


In other words, the purity of a state does not change in time — in fact, this argument generalizes to tell us that 


tr(p*), tr(p*), and so on are all time-independent as well. 


Example 319 


We'll now turn our attention to the case where we have a density matrices for a subsystem. 


Here is where the density matrix becomes more interesting: we'll be considering bipartite systems, where a system 
can be broken up into two parts A and B. Basically, these two subsystems make up an isolated system (so the joint 
system evolves unitarily), but A and B can interact with each other. 

We saw an example of this earlier with two entangled particles A and B, and we found that we couldn't describe 
one particle with a single pure state. That idea will be generalized now: basically, we can describe A and B with density 
matrices, and these matrices will satisfy all of the fundamental properties, though the time-evolution will not be as 
simple because we don't have isolated subsystems. 

Let the dy-dimensional Hilbert space for system A be H,, and suppose there are orthonormal basis states ef, tee e}- 
Similarly, let the dg-dimensional Hilbert space for system B be Hg, and suppose there are orthonormal basis states 
ep, tee, ors Then the joint system AB is bipartite, where A and B are generically entangled (so there isn’t a pure 
state description for the subsystem A). It's possible AB is in a pure state, or it’s possible that AB was prepared in 
such a way that it can only be represented as an ensemble or density matrix. Either way, we’re claiming that we 


have a density matrix description for our subsystem, and here’s how we'll phrase this point: 
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Proposition 320 
Suppose we have a density matrix pag € L(H, ® He). Then 


pa = tre(pas) = S- (ex |paslex ) 
k 


is a valid density matrix that describes the subsystem A. 


(We'll denote the trace of the whole system AB to be tr, the trace of the subsystem A to be tra, and the trace 
of the subsystem B to be trg.) 


Proof. We need to check if this density matrix is an operator on the subsystem A and satisfies the characteristic 


properties. For the calculations, a useful fact to recall is that 
tr=tratrg =tretra 


(as an important property of the tensor product space), so 


tra pa|=tratre pas =trprs =| 1 


since Pye Is a valid density matrix. Similarly, we can see that A is positive semidefinite and Hermitian, and now we 
turn our attention to the main question: why does A need to be a density matrix for the subsystem? 

To answer this, recall the example we had with our entangled particles last lecture: if we have an operator O, 
acting on the space Ha, then its extension to the tensor product space AB should be O, ® /g. In other words, we 
need to check that 


tra(eaOa) = tr(pasOa @ Ip) 


for any operator O,, which would tell us that whenever we want to compute an observable O, for the subsystem A 
(right side of the equation), we can indeed use the density matrix pa (left side of the equation). 
To prove this, we can do an explicit calculation: we write down the most general form for our density matrix pss, 
which is 
PAB = S- P14 ler) (es| 
J 
where 6), are matrix elements, and we sum over all indices /, J of the tensor product space Ha ® Hg. So each one 


should really correspond to two indices: letting / run over (/, 2) and J run over (, m), we have 
~ A B\ /,A B 
PAB = S- Bie.jm lef) ® ler) (e?| @ Cen] - 
ije.m 
Reorganizing the notation a bit, this can also be written as 
~ A\ / A B\ /,B 
= > Bijem lef) (eF| ® ler) (em 
ij,2,m 


where the tensor product separates out the action of the linear operator on the A and B Hilbert spaces. We've now 
written down the most general density matrix for pag, and now let's try to verify the property we want for pa: by 
definition, 


pa =trs Pass, 
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and now trg acts only on the B-side of our above expression and moves the ket from one side to the other: 
pa= >) Bijem lef) (ef'| ® (em ler) = >_ Biimm lef) (ef | 
ij,e,m ijm 


B 


because our basis vectors e” are orthonormal. And now this is indeed an operator on the subsystem A, satisfying 


tr(paQa) = S- Pij,mm (eA |O,le*) 
ijm 
by the same computation of the trace we've seen before (we move the ket to the right of the bra). So now we have 


the left hand side of the boxed equation we're trying to derive, and now we can compute the right hand side: plugging 


in the definition of our general density matrix yields 
tr(paBOa ® Ig) =tr S- Pij.em |e”) (ef| Oa ® |er’) (ef | I. 
ij2,m 
Since trace is linear, we bring it inside and take the trace for each of A and B, yielding 
= S- Pi jem (eA|O,l|e") em- 
1j,2,m 


And now setting 2 = m to get rid of the Kronecker delta indeed makes this reduce to the same expression, as 


desired. 


So the density matrix p, for our subsystem behaves consistently with how we think observables should act, and in 


some sense it just “erases” the information associated with the other system B. 


Example 321 


We'll return to the entangled particles for Alice and Bob 


faa) = al+)a\ Polen: 


What is the density matrix pg that Bob sees? 


Because pag IS a pure state of the bipartite system, we know that 


Pas = |Was) (Waal = Fl Ha l=le= lala ee) sl tla(-le — (-la (4la)- 


We can expand and find the four terms, writing the A and B parts next to each other: this yields 


pas = 5 (I+)a (tla) @ (Ye (1a) + 5 (la (-la) ® (4) 9 (41a) 


5 (la (-la) @ (de He) — 5 (Ia (Ha) @ (He (la). 


Our goal is to find 


Pp =tra pas, 


but the A-trace of the first two terms are 1 each, while the A-trace of the last two are 0 each, so we just end up with 


In other words, B is maximally mixed when we take the “maximally entangled” state of AB and assume that Alice 
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does nothing. But notice that even when Alice does make a measurement along any axis (and we don’t know what 
the result is), Bob will also end up with a density matrix corresponding exactly to the one we've just found! So the 
description of B looks the same when Alice measures or does nothing, and we'll understand this a bit more in the 


coming discussion. 


Problem 322 


Our next point of discussion is how to write down a nice description of a pure bipartite state |w,s) in terms of 


the density matrices pq and pg, known as the Schmidt decomposition (this is the same mathematician as the 


Gram-Schmidt decomposition). 


We know that |wWazs) lives in the tensor product space H4®@ Hg, so we can write it in terms of an orthonormal basis 
for each of Ha and Hg — call them {|k,)} and {|kg)}, respectively (where &k ranges from 1 to dy, or dg, respectively). 
Specifically, pick the bases such that p, and pg, the Hermitian density matrices of the subsystems, are diagonal (so 
pick the eigenvectors of p, and pz). 


This allows us to write |qw,g) nicely as follows: we know that its density matrix is 
pas = |Wae) (Wasl 
because we have a pure state, and we know that 
Pa = tre pas 


is a da X da Hermitian matrix with some eigenvectors |ka) and eigenvalues px, — in fact, with our choice of basis, p, 


will be diagonal, since 


pa = >~ px ka) (kal | 


k 


Remark 323. Note, however, that we don't always actually have dy, different terms in this sum — many of them may 
turn out to be 0. So we're going to assume that we order the eigenvalues such that all of the zero pxs occur at the 


end. 


And for this reason, we can say that we sum from 1 to r for some r < dy in the above expression. (Without loss 
of generality, we can assume da < dg for now.) And we'll use this to write down an ansatz for |qW,g): it’s some linear 


combination of the basis states in our tensor product space, so we can write 
? 
B 
|Was) = d_ Ika) @ | Ve) 
k=1 


for some states we € Hg indexed by k as well. And now remember that we should get p4 when we take the B-trace 
of this expression, but the resulting density matrix only has ks appearing for 1 < k < r. So we should only sum up 
to r in this expression, and to make more progress we should use this ansatz to compute the density matrix pas: 


this yields 


r 


pas =|Was) (Waal = > |ka) [WE) Chal (WE. 


k,k=1 
Plugging this into the definition of o,, we find that (again sliding the B-kets to the right of the B-bras) 


r 


Pa = tre PaB = > ka) (kal (be We) 


And now in order for this to be consistent with the boxed expression above, we must have the same matrix elements, 


meaning that 
(be |We) = Prdye- 


(so that the diagonal entries are p, and all off-diagonal entries are zero). So the different |w?)s that show up in the 
expression for our pure bipartite state must be orthogonal. 


With this notation, we can now define 
| we) = Vx lke) , 


so that we have an orthonormal set of states |kg) (where 1‘ < k <r). And now we get the result we've been working 


towards: 


Proposition 324 (Schmidt decomposition) 


A pure bipartite state can be written as 


|Was) = >_ Ve |ka) ® |ke), 
k=1 


where we have r < min(da, dg), >> Pk = 1, and orthonormal bases (ka|k/,) = (ke|kg) = Ska, SO that 


pa= >> px |ka) (kal, p68 = >_ Pe lke) (kal - 


k=1 il 


(And the Gram-Schmidt procedure tells us that even when r < da and r < dg, we can still finish constructing an 
orthonormal basis for the entire state spaces H, and Hg.) In words, if we diagonalize the density matrices p, and pz, 
that lets us write down the pure state |w,g) nicely as well. 


There are a few things we can observe about this representation: 


* Because the coefficients px, are the same for the density matrices of A and B, those density matrices pa, 0g have 
the same set of nonzero eigenvalues. (It’s possible that the spaces are of different dimension, so we might 


have a higher multiplicity of 0 in one case than the other.) 


- The integer r is known as the Schmidt number of the decomposition: this is the number of terms in the density 
matrices, as well as the number of terms in the representation of the pure state |wW,zs) itself. Here, r can range 
from 1 to min(d,, dg): when r = 1, we have pure (that is, not entangled) states in the subsystems A and B, 
and otherwise we have entangled particles, meaning we can’t factor into a state of A and a state of B because 


the density matrices p,, Pg are mixed states. 


- The purity of op, and pg are the same: both of them are just 
: 
C=tle) = > p. 
k=1 


So now let's return to our canonical example of Alice and Bob sharing an entangled pair of particles: recall that 
the density matrix for Bob is unaffected under a measurement by Alice, unless we know the exact value of Alice’s 
measurement. (In both cases, we get the same maximally mixed state.) We'll make this result more general, and this 


is what is known as the no signaling or no communication result. 
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As discussed above, our density matrix evolves after a measurement via 
pre p= S- E;pE; 
i 


when we're measuring along n orthonormal basis {|/)} and we define EF; = |/) (/|. We've discussed this in the context 
of a single system (not entangled), but now we want to extend this to a bipartite system. 

More specifically, suppose we have a system AB, and suppose Alice measures along an orthonormal basis {|/) ,}. 
Then we have orthonormal projectors EA = |i), (i|,4 which satisfy the usual properties (EA) = EA, EAE, and 


ar ef = /,, and now (in a completely analogous way as before) we have 


pap > Bas = (EA @ Ip) pan(Ef @ Ia). 


! 


(We can check this as an exercise.) 


Proposition 325 (No signaling) 


Under the above transformation, the density matrix 6g = tr, Pag is invariant (it is equal to pg). 


So the density matrix of the composite system will change, but Bob's will not. 


Proof. First, write our density matrix as a general sum 
PAB = Ss OF @ OF, 
J 


where oA and oF are some general operators in the H, and Hg spaces. (This is possible because the tensor product 


space is spanned by vectors |/), @ |j)g.) Then 


be =trapas =tra > (Ef @ Ip)(Of ® O7)(Ef @ I) 


if 


which can be simplified by taking the product of operators as 


=tra >| EAOAEA @ OF. 


if 


But now if we take the trace term by term, we end up with the operator in the Hg space 


= SWE Ore O?. 


iJ 


But now cyclicity of trace and the property of the projection operator tells us that 
SMES wero => we), 
i i i 


and now we can bring the sum inside the trace: 


=tr (= co?) = tr(07). 


So the introduction of the Ejs has not contributed to the trace: bringing it back to the original expression, we're left 
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with 
be = > _tr(OA)OF, 
J 


which is indeed what we obtain if we take tr, of the original density matrix pz. 


Problem 326 


We're now ready to return to the question of time evolution, now that we have an “open” system (or subsystem 


of the whole world). 


We're still focusing on bipartite systems here, but we'll call our system AE this time (where A is what we care 
about, and E is the outside environment). F is often larger than A (for example, when we have a thermal ensemble), 
but because AE is still a quantum system, it still evolves unitarily. A is known here as an open system. 


The whole system can be described with a density matrix pac, and what we care about is our subsystem 


Pa =tre Pace. 


We want to know about py’s evolution in time, and we'll first show that it’s not necessarily unitary. To understand 
this, consider a bipartite system of two unentangled spins in a pure state at time t = 0: in other words, both A and 
B have pure state descriptions at first. 

But it’s possible that interactions between A and B can cause the two particles to become entangled, meaning 
that A’s description is now only possible with a mixed state. This transition from a pure state to a (nontrivial) density 
matrix is called decoherence, and it’s only possible when we don’t have unitary time evolution (because the purity 
has changed, which isn't allowed in unitary time evolution). 

So in general, we can’t actually say very much about the evolution of py, other than that the environment’s behavior 
can lead to decoherence. (Typically, we go from a pure to a mixed state and stay mixed forever if the environment is 


large enough.) But we can say that 


pa(t) = tre pae(t), 


and because AE evolves unitarily, this is 
tre (U(t)pac(0)U'(t)) 


for some unitary operator U. And that’s about as much as we can say — since we're taking the partial trace over E, 
not the whole matrix, we can’t use cyclicity of trace. 
So suppose we have a pure state at time t = 0, where the system A and environment E are in pure (unentangled) 


states |g) and |E), respectively. Then the density matrix for the whole system takes on a simple form 


pae(t) = |ba) (bal @ |E) (EI, 


and we can plug this into the formula above, assuming we know the Hamiltonian of our combined system. But there's 


still a possibility of decoherence even in this case, and that’s best illustrated with an example. 
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Example 327 


Suppose a box has two spins, and we have a Hamiltonian of 


H = —hwoWo?). 


Also, suppose we start with an initial condition 


|W12(0)) = |x; +)1 @ [x +)o- 


In this case, the energy is minimized if the two spins point in the same z-direction (so there is indeed an interaction 
between the two particles). We know that we have an initial pure state of two particles that are not entangled, and 
our question is basically “what can we say about the density matrix of particle 1?”. 


To approach this question, we know that 
(12(0) = |W12(0)) (W12(0)| 
because we have a pure (total) state, which means that 
pra(t) = e!*/P py (0)elMt/P 
because we have a time-independent Hamiltonian. And from here, we find that 
pi(t) = tre pio(t), 


and now we just need to go through all of the calculations to figure out how the density matrix evolves in time. We'll 


skip to the answer for now (we can try doing out the math ourselves) — the result is that 


pa(t) = 511) (t+ 510) CU + 5 cos(2urt) (It) (H+ HY «ti | 


12. 4/2 
1/2 1/2 


phrase this is that if we measured the two particles along the z-direction, we'd get a probability of + of any result 


At time t = 0, we have four equally weighted terms, so the density matrix looks like 


| — another way to 


{++,+-—,-4, } (because particles along the x-direction have equal chance to be +z or —z when we measure, 
and the two particles in our system started off independent). Notably, this is a pure state, and it’s only a pure state 


because of the nonzero off-diagonal terms. 


1/2 0 
But a little later (at time t = 7), the cos(2wt) term disappears, and then our matrix will look like , mal 


and now we have a maximally mixed state for A! And we can check that as a function of time, 


¢€=tr(pf) =1- 5 sin?(wt). 


So the purity oscillates between 1 (a pure state) and $ (a maximally mixed state) for all time. This is a toy model 
where we can see decoherence happening — it’s too simple to understand something like decoherence in quantum 
computers — but it does illustrate that pure states do not need to stay pure for subsystems. 

So now we can return to the unitary time evolution equation 
Op 1 


Op 


ap = lHel = [H, p] 
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This does not suffice for describing subsystems that do not evolve with a unitary operator, but the last topic of today 
is generalizing this equation to something called the Lindblad equation. It’s not completely general, but it is able 
to modify the above equation in a way that preserves the Hermiticity, unit trace, and semi-definiteness of o, while 
removing the assumption of unitary time evolution. 

We've Just noticed that going from a pure to a mixed state is not always easy to do, and our approach here will be 
to construct a phenomenological equation — that is, it is not derived from first principles, but it is consistent with 
our observations. Basically, we'll make the argument that a small open system can have its coherence and information 


“dissipated” into a large environment without disrupting that environment very much. 


Proposition 328 (Lindblad equation) 


In certain systems, we have the governing equation 


Op 1 


1 
a = glthelt (4x01 - 5 (Lhe) 
k 


where L, are the Lindblad operators and k depends on the system. 


(Recall that {A, B} is the anticommutator AB + BA). Here, the right hand side is constructed so that it is 
Hermitian — the Lys do not talk to each other, and indeed every term that we see here is Hermitian because it’s equal 
to its dagger. 

Understanding the other parts of this equation, such as why we have a —$ constant in this equation, will come 


about when we verify that this matrix has a constant unit trace: we wish to show that 


Stele) = tr (SP) =e (i a+y (40th - 3(thu 1) | 


is equal to zero. But trace of a commutator vanishes by cyclicity, so the first term goes away. Then we just need to 


check that the contribution from each L, is zero: indeed, 
testy +_ 1 1 4 
tr Luply — 1b Le, Ph =tr Lyply — 5b, Lae — spl yl 


and now all of these three terms are just cyclic shifts of each other, so we can reorder and get tr(0) = 0. So we have 
verified that the trace is constant in time. 
Showing positive semidefiniteness is also not too difficult: we show that in a time dt, a positive semidefinite matrix 


will stay positive semidefinite through this evolution. And this is left as an exercise for us. 


Example 329 


A classic case of decoherence we've already started studying earlier in the class is nuclear magnetic resonance. 


Remember that in this system, we have a spin, and we have a magnetic field in the z-direction. We then introduce 
an additional signal which makes this spin state rotate, and that’s the rotation that is picked up by detectors in practical 
applications. But it turns out the lattice of surrounding atoms interacts with the spin in question, which will cause 
decoherence of the circular motion, known as transverse relaxation. 

The constant 7T> measures how long it takes for this to occur: once this happens, the spin behavior is destroyed 
(and the particle basically stops spinning). We also have a related constant 7,, which is the longitudinal relaxation 
time. What we discover there is that the spin stops rotating and starts being described by a probability of being spin up 


or down — due to thermal effects from the surroundings, we will eventually get some proportion of the states pointing 
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up versus down (based on the Boltzmann distribution from statistical physics), and 7; controls how long it takes for 
this to happen. 

Often T> < Ty in most materials, and these are the times that we can detect for different materials with a physical 
machine. So studying this example more carefully is important both practically and for our current understanding. 


Suppose that our state points in the x-direction at some time: then our wavefunction looks like 


I?) )+|t)), 


1 
a vallt 


which corresponds to a density matrix of 


1/2 i 


1 
p= 5 (A+ IN I+) +) GQ) > be 1/2 


(This is a pure state.) In such a matrix, the off-diagonal $ terms are called coherences — after all, if those two terms 
were zero, we would have a maximally mixed state, which has no coherence at all. So transverse relaxation affecting 
this rotation means that by the time our coherences are suppressed, we have complete decoherence (the diagonal 
density matrix corresponds to a particle that has a 4 chance to be in +z and a $ chance to be in the —z). It’s also 
possible that the probabilities at the end of the day are not quite 50 — 50: it’s possible that he diagonal terms are 0.52 
and 0.48 or something, due to the magnetic field. (And this is where T, comes into play.) 


So now let’s talk about this in more generality — suppose our initial matrix looks like 


p++(0) ce: | 


= be p--(0) 


Intuitively, what we should expect to happen is that the transverse relaxation eventually kills the off-diagonal terms, 
so we are always going to go into a mixed state (eventually corresponding to a near-diagonal matrix). And 7, should 
adjust the diagonal terms according to the Boltzmann distribution, and we want to know what kind of Lindblad 
equation can model this to get us the correct form that we want. (One way to phrase this is that Lindblad operators 
drive our time evolution.) 


It turns out that we'll use three Lindblad operators: we'll make the simplification that we have no magnetic field, 


1 0 
so B =0 and we'll eventually end up with a maximally mixed state ; | Define 
Ly =a|+)(-|, Lo=a|—-) (+1) 
so Ly and L> basically swap + and —, meaning they mostly affect the longitudinal relaxation — they change the 


population of our eventual + and — states. We say that a is some real number — since we always have L and Lt 


appearing at the same time, we just end up with a contribution of |a|? anyway. And we'll also need 


L3 = Boz}: 


the purpose of this operator is that 


L3pL3 =~ 07007 


changes the signs of the off-diagonal term, so in our Lindblad equation we are driving the off-diagonal terms to zero 


(because this L3oL3 is a term that affects ge). So now our matrix in general will look like 


per(t) fine 


og ee p(t) 
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and we now need to plug everything in to the Lindblad equation by calculating matrix products: what we end up with 
is that 


Prs(t) Py-(t)] _ =O" (Pie = Psa) (AP 28" py. 
p(t) p--() =(a? 26" )\p-4 =ar(o.. = pis) 
(the left side is the partial derivative of the density matrix, and we're computing the right side explicitly, skipping the 


calculations). 

Looking at this equation, we can already see some of the physics: a should affect longitudinal relaxation, and 
indeed the diagonal terms will be driven towards (3, 3), because a larger pi, than p__ drives the top left expression 
down and the bottom right expression up. (In the case where B = 0, this means equilibrium occurs when the terms of 
p++ and p__ are the same.) 

But also 6 (along with some help from a) give us a simple exponential decay of the off-diagonal terms, because 
the time-derivative of each term is just proportional to its value! We'll find that the off-diagonal terms evolve via 
1 


p(t) =py-()e/™, p_y(t)=p_4(O)e/?, T= a2 + 282 


(this is a short time if we make 6 very large, but we can also consider the case where GB = 0 and the constant of decay 


for longitudinal and transverse relaxation is of the same order). Similarly, the diagonal terms evolve via 
1 _ 1 1 _ 1 
Pet) = gre nee (o.+(0) = 5) , p--(t)= 518 ws (o-0) = 5) 


(as t gets large, the extra terms decay exponentially), and 7, = san. (And now we see that a large 6 is indeed 
necessary for us to have a physically correct model). 

In summary, we've now concluded the study of a simple open system (nuclear magnetic resonance for B = 0) using 
the Lindblad operators. Basically, this is a nice way of approaching a problem without needing to understand all of the 


dynamics of the whole quantum mechanical system. 


46 Density Matrices: Measurement 


Now that we've discussed some interesting ideas of density matrices, we'll now reexamine the problem of measurement 
in quantum mechanics. There's lots of questions that come up at the foundational level — so far, we've been following 
the Copenhagen interpretation of quantum mechanics, developed from 1925-1927. Here are the main points of that 


interpretation: 


+ States evolve unitarily via the Schrodinger equation. 


+ Measurements can be described mathematically in a simple way: states are projected (non-unitarily) by mea- 


surements into invariant spaces of observables (Hermitian operators), such as eigenspaces. 


- The possible values of a measurement are the eigenvalues of the corresponding operator, with probabilities given 


by the Born rule. 


(There is no uncertainty once we make a measurement, and we're taking all of these as axioms of the theory.) 
We've discussed measurement in various ways — measuring along a basis, looking at a partial space or subspace of the 
Hilbert space, and so on. But the interesting point is that of measurement being non-unitary: what is this specific 
measurement apparatus doing which ts different from a normal evolution? 

Despite lots of work and many debates, not very much insight has been obtained here, but it’s still worth considering 


these questions to get a better understanding. One way in which this happens is in the reading of Bohr and Heisenberg’s 
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original papers, trying to understand how they understood these concepts. 


Fact 330 


The orthodox reading of the Copenhagen interpretation is that our measuring devices are classical: this means 


that the fundamental laws don’t actually apply to them in some way. 


Often, this type of reading is associated with a Heisenberg cut between the quantum and classical domains: the 
idea is that at the microscopic level, quantum mechanics takes effect, but we need classical mechanics to make sense 


of all of the measurements. 


Fact 331 


But the modern reading is that the Heisenberg cut doesn't really make sense: in fact, we can now build very 


large quantum systems, where we have a billion charge carriers in a superposition of two different states. 


In other words, classical physics is now essentially thought of as “what quantum physics looks like at large scales:” 
there aren't fundamental differences in the two domains. 

So there are a few proposals for how to interpret measurement in this framework, and we'll discuss one that has 
to do with decoherence, as well as one centered around the many-worlds interpretation. 

We'll start with a more accessible question: what does it mean for the wavelength to collapse? It should 
be possible to look inside of our measurement apparatus and see when this non-unitary transformation occurs, and 


perhaps that will give us a clearer picture. 


Example 332 


Suppose we are trying to detect a photon by using a photomultiplier tube. 


Basically, a photon can go into a box, and there is a cathode (electrically charged) near the entrance. The photon 
will hit the cathode, which will release an electron because of the photoelectric effect, and this electron will hit another 
plate along the box, which ejects more electrons. This process continues to the anode, and by this point there are 
many, many electrons — we will have a macroscopic current. So then our photomultiplier tube will be able to detect a 
photon when we measure a nonzero current. 

The direction of the incoming photon beam is not completely certain here — if we have a few different boxes next 
to each other, then there is a superposition of different states that this photon could be in (based on which detector 
it entered). But only one of these detectors will actually go off, and when that happens, we will have collapsed the 


wavefunction. 


Example 333 


Suppose we have a calcite crystal (in which the index of refraction depends on the angle and polarization of the 


incoming beam). 


Then when a photon enters this crystal, it can exit in one of two possible basis states: |H), corresponding to the 
exit angle from a horizontal polarization, or |V), corresponding to the exit angle from a vertical polarization. Then 
when we send a photon in, it can be in an arbitrary superposition of a |H) and |V) state, and when it comes out, it 
doesn't actually need to be in one of those two basis states. In fact, the wavefunction will still be spread out over 


the possible angles (it still lives in some superposition, and this is known as pre-measurement), but once we put a 
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photon detector along the |H) and |V) directions, exactly one of the two detectors will go off, and the position is 
known. So having this measurement apparatus forces the wavefunction to collapse, and the act of measuring with our 


detectors is what we usually mean when we say that we “measure along a basis.” 


Example 334 


Suppose we want to measure the momentum of a charged particle. 


We can send in the particle through a small slit in a wall: at that point, it can be in a superposition of many 
momentum states. If we put a uniform magnetic field, then the Lorentz force is proportional to the particle's velocity, 
so it will bend into a circular orbit. But again, the act of interacting with the magnetic field does not constitute a 
measurement — our particle is put into a superposition of orbits, and it isn’t until the particle curves back and hits a 
detector in the wall that we know the velocity, and that’s when the wavefunction collapses. 

These three examples are all a bit different from the measurements we've been talking about earlier in this class, 
though, where we end up in an eigenstate and will get the same result if we measure again and again. In the examples 
above, the particle is actually destroyed or irreparably changed, which is why some other experiments, known as 
quantum non-demolition measurements, have also been considered. 

So we've now thought about how measurements can be done in a few experimental setups, and now we'll think 
about how these measurements can be established quantum mechanically. The ideas here are due to von Neumann 
— it doesn’t really remove the mystery of measurement, but it does explicitly suggest how certain systems actually 


behave. 


Example 335 


Suppose we have a system S and an apparatus A, where S and A interact with each other. A is a quantum system 


with pointer states (for example, in a Stern-Gerlach system, they could point to +z or —Z). 


Even if the apparatus may be macroscopic, we'll still think of it as a quantum system. Say that our system S has 


an observable Os and eigenvectors |s;) for it, such that we have a finite number of possible states: 
Os|s;)) =s|5;)), I<ic<n. 


In other words, we wish to “measure” with Os to see which of the n possible states we're living in. So now we can say 


that our apparatus A has an observable Og and pointer states |aj;), such that 
Oz|aj) =alaj), 1<j<m men. 


Basically, we want each pointer state to correspond to a configuration |s5;) of our system, so we must have at least as 
many pointer states as we have Os eigenstates. 
At time t = 0, we must be in some state |wW(0)) for our system, and this is in some superposition of the basis 


states: 
n 


IW(0))> = a Ci |$i) - 


i=1 
But because our system is connected to an apparatus, we should really be thinking about this in terms of the composite 


system: then we have the apparatus in some initial state, meaning we can write 


(0) sq = (>: Gj ») ® |(0)),. 


i 
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But we want to design the apparatus in a way so that there is an interaction between S and A, so there is also an 


interaction Hamiltonian Hs,. 


Proposition 336 


We can pick a Hamiltonian such that at some later time 7, |~(7)) has achieved pre-measurement, meaning 


that we are in an entangled state 


(7) = oy ce’? |s;) ® lai) , 


where p; is an arbitrary phase. 


What we're saying is that we've essentially coupled the system with the measurement apparatus: if the state 
of the system is |s;), then our apparatus is in the |a;) state. And this is really as far as we can go — we can create this 
correlation, and then the apparatus allows us to measure things at a classical level, but it doesn't solve the mystery of 
the non-unitary transformation. 

Instead of proving in general that we can go from |wW(0)) to |w(7r)) by picking some appropriate Hamiltonian or 


unitary time-evolution U, we'll do an interesting example: 


Example 337 
Suppose S and A both have Hilbert space V = C? (that is, the spin 1/2 vector space), where the operators are 


Os =o Oa = oA 


Zz: 


We claim that the Hamiltonian 
1 
Hea = 5 hw(1 +02) @02 


will be able to establish the desired interaction between S and A. Let's start with the wavefunction 


IW(0)) sa = (C4 I+)5 +c |-)s5) @l-), 


(so we start off in a single state of the apparatus, just like in the discussion above). Then we need to figure out the 
unitary time-evolution of this state, so that the |+)s and |—)s line up in the system and apparatus. Since we have a 


a in the Hamiltonian, it’s convenient to rewrite 


| a= aali+a x: —)a), 


and then plugging this back in, we can rewrite our initial state as 


1Y(0)) sa = 5 (e+ Hs Pha = Gx) 6 be —)4) 6 |—)5l-)ak 


Let’s now see how the Hamiltonian given evolves our composite system: the unitary time-evolution operator is 
/ 
U(t) = elHsat/h _ exp (-5wxt + a2) ® ot) ; 
which means that our state at a later time is 


I(t))sa = U(t) |B) sa - 


Note that in the last term of the boxed expression above, because the system is in the — state (and therefore has an 


eigenvalue of —1 for ae), the exponential term will collapse, and therefore the last term is left invariant. Since this 
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last term is of the desired “coupled” form that we are looking for, we’ve chosen a good Hamiltonian, and now we just 
need to apply the unitary operator on the first two terms. Notice that because our states are already eigenstates of 


the unitary operator, we have 


U(t) [+)5 xi) = eet 4) 5 Ix: ) = Ft, 


SO 


elski ta Saeittcs Hs te |e 


—iwt 


I(t) sq = ze 


And now we just rewrite everything in terms of the z-eigenstates: rewriting |x; 


TT 


) as a (|+) +|—)), we find that 


W(t) ga = 5er(e™" — eM) Hg Hg + Sele + YH gate sida, 


which we can rewrite as 


IW(t)) sq = —/sin(wt)cy |+)s5|+), + cos(wt)cs |+)>|-)a+Cl-)sl-)a 


The |+) |—) is the bad term that we're trying to remove through time evolution, and indeed after some time t* = =, 


the cross term goes away, and we'll have 


[HCD sa = — ice L+)5 1+) + —l-)sI-)a} 


This means that after a time of t* spent evolving unitarily, our states in the system and apparatus have been entangled 
perfectly (up to some changes in phase, but not amplitude). And now if we measure our apparatus to be in the |+) 
state, we will also find the system in the |+) state, and same with |—). 

At the end of the day, though, our system still hasn’t actually found a way to collapse into one of the two states: 
we've reached pre-measurement, but we haven't solved the mystery of how the measurement is actually made. So our 


focus now will be on the modern viewpoint of this issue and the crux of this measurement problem at hand. 


Fact 338 


Here is where decoherence really comes in to play: remember that an open system does not evolve unitarily, so 


we can try to claim that the non-unitary nature of projectors comes from the non-unitary evolution of an open 


system. 


To expand on this idea, suppose that our SA composite system is now connected to an environment E. Then 


W(t") sa = d_ als) @ la) 


i] 


is in the pre-measurement state, but when we introduce the environment, we can now write the states as 


\W(t")) sae = >_ cls) ® |aj) @ lei) . 


i] 


The |e;) states can be very different from each other, but what we have is still a pure state for SAE. To introduce 


something interesting (and get the decoherence in the picture), let's look at the density matrix of SA, which is 


psa = tre psae = tre > cic} |5i) lai) lei) (sl (ajl (el. 
iJ 
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This then simplifies (with the usual rule) to 
So ig Isi) ID ai] (sil (ajl (ele) - 
if 


Because there are many degrees of freedom in the environment, we can approximate (ej|e;) as dj (one possible 
explanation is that pointer states could couple to different orthogonal environment states, and another is that the 


total overlap ends up being small), and thus this expression becomes 
= S— Ici? |si) lai) (sil (ail, 
i 


and given the normalization of our states, we must have So Ia? = 1. So this is indeed a valid density matrix, 


corresponding to an ensemble 
E = {(lerl*, |s1) lar). --- » (lenl?, |5n) [an))}- 


In words, this means that when we measure if we don’t know about the environment, our SA composite system has 


a probability |c;|? of being in the state |s;) @ |a;). (So density matrices indeed give a motivation for why we have the 


familiar-looking probabilities!) 


Example 339 


One instance in which we might have seen a system like this is Schrodinger’s cat. 


In such a system, It may seem plausible to start with a superposition 
7=(|®) + |©)) ® |Eo) 
Ts 0 
V2 
where the cat Is either alive or dead (Schrodinger describes a contraption which puts the cat in this state), and there's 
definitely an environment around the cat. But it doesn’t actually make sense to have the same |Fg) environment 
state attached to both the live and dead cat: the live cat (for example) needs to breathe, so it interacts with the 
environment in a different way from the dead cat. Therefore, we will eventually end up (after basically any instant in 
time) in the state 


1 
vo |E1) + |©) |E2)). 


So now if we assume that E, and E> are orthogonal, the density matrix of the cat should be 


1 1 
Prat = 5 |©) (O| + 5) |©) (Q| ’ 


2 


and now It seems to makes sense that decoherence can lead us to a mixed state. 

Unfortunately, what we've been discussing with decoherence actually raises more questions than it answers: intro- 
ducing the environment still yields some issues with measurement. It’s not clear that the environment states |e;) 
need to couple to the SA states in the way that they do — instead, it's possible that they couple to linear combinations 
of the SA states, in which case we have a different-looking density matrix. In addition, we know that different-looking 
ensembles can give the same density matrix — that is, the resulting ensemble E can be ambiguous. 

So now we'll turn our attention to the other idea, which is the many-worlds interpretation, proposed by Everett. 
In this theory, the wavefunction doesn't collapse in the same way that it does in our previous discussion. Instead, upon 


measurement, the universe splits (based on the result of that measurement). 
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Example 340 
Suppose Alice has a spin 1/2 particle in the state 


lb) =e [+) +c |-). 


As always, |c,|* + |c_|? = 1, so the Copenhagen interpretation tells us that we have a |c;|* probability of ending 
up in the |+) state and a |c_|* probability of ending up in the |—) state after a measurement along the z-axis. 
The many-worlds interpretation accounts for this by requiring us to include the measurement apparatus (and in 


particular Alice) in the wavefunction, so that we have 
|+) = (c,|+) + c_|—-)) @ |Alice) . 
Then when Alice does a measurement, we claim that the state factors into 
= > c,|-+) [Alice sees +) + c_|Alice sees —). 


So Alice is “acting like a pointer state’ like in the von Neumann argument above, and from here the idea is that there 
are two independent branches of the universe: in one of them, Alice sees + and the state is in |+), and in the other, 
Alice sees — and the state is in |—). Those two branches then never interact with each other, so further experiments 
in each branch will just keep splitting our universe into different paths. 

So if everything happens in some path of the universe, we need another interpretation of probability: one argument 
is that before Alice observes the measurement, she has some self-location probability of ending up in the different 


branches, dictated by the coefficients cz. But this idea is a big conceptual departure from what we've been discussing 


so far — we don’t really know what we're talking about when we write down a ket like |Alice sees +), and we need 
more evidence to make this a valuable theory. (If we read the literature and compare the ideas, we can think through 
these thoughts ourselves as well.) 

To finish off the class, we'll discuss the topic of quantum computation, an area of ongoing research. Basically, 
quantum computers are able to do computations in a different way from normal computers, exploiting the properties 


of superposition and interference, which often makes computations go faster. 


Definition 341 


A bit is the basic unit of information: it is an object with two possible states, 0 and 1. A qubit is a quantum 


object with two possible basis states, |0) and |1). 


As we've mentioned before, there are infinitely many possible states that a qubit can be in (corresponding to the 
different linear superpositions of |0) and |1)), but only finitely many states that a bit can be in. A qubit can be created 
in many physical manifestations — a spin 1/2 particle, a particle in a potential with two energy levels, and so on — but 


the point is that we'll be using qubits to do calculations faster than bits. 


Fact 342 


In 2019, Google did a computation with a 53-qubit computer that took 200 seconds, which a normal computer 


takes a few days to do. 


One concept stronger than just “being faster” is quantum supremacy, which is the idea that a quantum computer 


can solve problems that normal computers cannot. The idea of having a “programmable computer” (made precise by 
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Turing, so they're called Turing machines) is connected to the Church-Turing thesis, which says that any algorithm 
we can do with a computer can be done with a Turing machine (the simplest possible programmable computer). This 
result is widely believed, but there's a stronger version of the thesis which claims that any algorithm on any computer 
can be recreated efficiently on that Turing machine. 

In other words, if we have a problem of (for example) input size N, an efficient algorithm takes a polynomial 
number of operations in N. (A non-efficient algorithm would be, for example, one that takes exponential time.) So the 
stronger Church-Turing thesis basically says that we can be equally efficient with a computer and a Turing machine, 
and the question here is whether this is true for quantum computers as well. At the moment, It does seem like 
quantum computation may be able to do calculations (like prime factorization) efficiently, while classical computers 
can not. But a quantum computer has limitations due to decoherence, and at some level this is unavoidable (so we 
need to build in error correction). That means that quantum computation algorithms are more complicated, and thus 
we don't actually know how much that error correction will affect the efficiency of our algorithm. 

What we'll spend time on here is to understand theoretically how a quantum computer takes advantage of 
superposition, and one main idea is that we can simulate many quantum particles (which is very difficult in a classical 
computer). 


Let's start with the qubits themselves. By convention, the notation we often use here is 


0) =|z;+) = 


(These are sometimes called the computational basis states.) Then a general arbitrary state of the qubit is 
ao 
Isp) = ao |0) + a1 |1) = | 
a 


where ao,a; € C and |ao|* + |a,|*? = 1, as usual. We can measure the value of the qubit along a basis, which 
corresponds to “reading the bit and seeing if it is O or 1:" if we measure along the computational basis states, we get 
|0), corresponding to the bit 0, or |1), corresponding to the bit 1. 


So now, suppose we have two qubits instead of one: we can describe a general state as 
2) = ago |0) @ |0) + aoz |0) ® |1) + aro |1) @ 10) + aii 1) @ 1), 
where ajj € C and 9); ; |ajj|7 = 1. To make the notation a little nicer, we'll just rewrite this as 
|W) = ago |00) + ao1 |O1) + a19 |10) + ay; |11), 


and now we have four computational basis states (corresponding to the tensor product of the basis states of the 
individual qubits). And again, measuring along this basis means that we can read the two bits as 00,01, 10, or 11. 


One useful rule to keep in mind here is that we can identify states with binary numbers, so that 
|00) +0, |01)>41, |10)>2, |11)>3 


in this case. And we can generalize this to a system of n qubits, which corresponds to a tensor product space of 


dimension N = 2”: we write the computational basis states here as 


|x1) @ +++ @ |Xp) = |xX1-- Xp), x; € {0, 1}, 
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And then a sequence x; --- xX, represents a binary number, which then corresponds to a nonnegative integer: 
[Xt +++ Xn) = Xt-+* Xn2 € [0, 2" — 1]. 
A typical state of this state space is then 
W = ao...o |O-+-0) + ao... ]0--+1) +--+ + apa {1---1), 


where there are N = 2” total coefficients that need to be stored. 

So here we can see the drastic difference already: if we consider a 53-bit quantum computer like Google used, the 
state space has dimension 25°, which is on the order of 101°. So specifying a state of a 53-bit classical computer 
requires us to write down a list of 53 numbers, each of which is 0 or 1, while specifying a state of a 53-qubit classical 
computer requires us to write down a list of 25% numbers, each of which is some complex number! So the memory 


needed to store the state of a quantum computer grows exponentially. 


Remark 343. /f we store each coefficient using 8 bytes (a byte is 8 bits), we'll need 2°? . 23 = 25° bytes just to specify 
a single state. This is 64 petabytes, which is about a quarter of IBM's largest supercomputer’s storage. So working 
with this state is very difficult — just going from 53 qubits to 60 qubits means that our supercomputers can't deal with 


this anymore, especially when we need to time-evolve the state forward as well. 


So now we want to do operations on our qubits: these are the calculations that make normal computation possible, 


and they're interesting to study in the quantum case as well. 


Definition 344 


A (quantum) gate is a unitary operator on qubits. 


Example 345 


The simplest gates act on a single qubit, meaning that they are unitary operators on a 2-dimensional vector space. 


Remember that the Pauli matrices are Hermitian and square to the identity matrix /, so they are also unitary: thus, 


we have the matrices 


Notice, for example, that 
X|0) = |1), X21) = |0) 


(don't forget that we're zero-indexing), so this can be described as the NOT gate, which reverses a bit. Another way 
to write this is that the output of the NOT gate is 


X =NOT(x)=x@l, 


where the © symbol means addition mod 2. We like to represent these gates with diagrams: here's a representation 
of the NOT gate. 
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(In one sentence, the X gate takes |x) into |x) = |x @1).) In the classical case, we can only have an input of 0 or 


1, but in the quantum case, we can also send in some superposition and find its NOT value: 


ao |0) + ay np x bran + ay |0) 


(Basically, linearity means that it’s easy to write down the result of any superposition of |0) and |1).) We can 


also write down the action of the Y and Z gates, but there's another gate which is more interesting, known as the 


ie fe 2 
Hel) wi 


)+|1)), All) = 


Hadamard gate H: 


Then we can calculate 
1 


H|0) = 5 


1 
Weide (}0) — |1)), 


and we will see soon why this is useful. 


Example 346 


Now let’s look at unitary operators on two qubits, which means we now need unitary matrices acting on a 4- 


dimensional vector space. 


Here, it is still possible to visualize the 4 x 4 matrices, but we'll need to describe them with clear language. One 


well-known gate is the controlled NOT or cNOT gate, and in the classical case, it looks like this: 


x x 
y yOx 


Here, x is known as the “control bit,” while y is known as the “target bit” — the control bit remains unchanged, but 
y is changed based on the value of x. To transfer this into the quantum case, we will need a control and target qubit, 


and we'll transform on the basis states in the exact same way: 


x) |x) 
ly) ly ® x) 


This is called a “controlled NOT” gate, because it acts like a NOT whenever the control bit x is |1), but it does 
nothing when the control bit x is |0). And remember that the quantum gate only acts like this on computational 
states: we get the rest by linearity. 


Since the two qubits live in a tensor product space, another way to describe the action here is that 
-NOT 
Ix) @ ly) “> |x) @ly@x). 


We can ask whether the cNOT gate is unitary, and the way to check this is to look at the matrix representation of 


the gate: choose basis vectors in the tensor product space to be 
|O0) ,|O1) ,]10) , |11) 
in that order (so that the numbers are ascending in binary), and notice that the basis elements are sent to 


|00) , |01) , 11) , |10) 
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respectively. So the matrix corresponding to cNOT is 


Ucnot = 


oO OO fF 
oo FF Oo 
re OO Oo 
oF Oo © 


1 0 
which is in block form. And this block matrix form is a nice way of thinking about the cNOT gate intuitively 


— the action depends on the state of the first bit — and now this matrix is indeed unitary because it is Hermitian and 


squares to the identity matrix / (by block multiplication). 


Example 347 


Our next step is to try to construct a function using these types of unitary gates. 


For example, we may want to take one of the four (classical) functions f(x) : {0,1} — {0,1}, which sends each 


bit to some bit, and write it in terms of our quantum gates. In other words, does there exist a gate G such that 
G |x) = |F(x))? 


Somewhat surprisingly, it’s not always possible to do this! One function that we can construct is the identity function 
(where f(0) = 0, f(1) = 1), since we can just act with the identity matrix, which is certainly a unitary operator. 
Similarly, we can construct the function such that f(0) = 1, f(1) = 0 by using the NOT gate that we described above. 
But the other two functions are constant functions (which send either bit to 0 or to 1), and that’s not a unitary 
transformation (since it’s not invertible), so we can’t find a gate G that does the job here. That means that not all 
functions f can be “unitarily implemented” like this for even this simple domain and range, and thus we should not 
expect to be able to represent functions f in general: after all, unitary operators are always injective. 

So if we want to be able to use gates to implement a function f and do general computation, all such computations 
must be reversible. One trick to address this issue is to enlarge the state space: even when our function f only acts 


on a single qubit, we can use a gate Uy that takes in two inputs and also spits out two outputs: 


In other words, this is the unitary operator 
Us |x) ® |y) = |x) @ ly @ F(x), 


and now |x) serves as a kind of “control bit” for the function because it’s unchanged, but it is also the bit that is being 


evaluated by f. A slightly cleaner way of writing the above equation is 
Ur |x, ¥) = |x, y ® F(x), 


and we'll choose to use this kind of notation from here. (Notice that when y = 0, we have U; |x, 0) = |x, f(x)), so 


plugging in y = O will give us back the function reading f(x) on the bottom bit.) 
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Indeed, this function is reversible — in fact, applying it twice yields 


Ur Ur |x, y) | = Ur |x, vy ® F(x)) = [x,y @ F(x) @ F(x) 


but f(x) + f(x) is always 0 mod 2 (whether f(x) = 0 or 1), so this is just | |x, y) | back again. So this gate squares to 


itself (meaning it is its own inverse), and we want to make sure it’s Hermitian as well. We can do this by constructing 
the matrix representation again, but let’s try something different this time: note that we can write Uy; in terms of 


“matrix elements” by considering the operator 


Ur= So |x. y@ FOX) Ow. 
(x,y)€ {0,1}? 
(We can check that this has the correct action on each of the computational basis states, because they are are 


orthonormal.) Then the Hermitian conjugate flips the kets and the bras, so we now have 


U= Do ixy) oy @ Fd). 


(x,y)€{0,1}? 


To show that ul = Ur, we do a change of variables: let x’ = x, y’ = y ® f(x) (we can check that this is a reversible 
change of variables, because it is injective). Then y = y’ @ f(x) as well (because everything is taken mod 2), meaning 


we can rewrite the above expression: 


= YO ky er) & YI, 
(x4 ye {0,1} 
which is the same expression as U, with different dummy variables. Thus Uy is indeed Hermitian (so combined with 
the above information, it’s unitary), and we've now found a way to describe our function f using a unitary gate, just 
by using a larger state space. 


So now we have seen how to represent a classical function f with one input, using a two-qubit quantum gate. 


Fact 348 


It turns out that if we have a function f with an n-bit domain and a 1-bit range 


f(X1,°+* , Xn) € {0, 1}? 


(which is clearly not going to be injective in general because we have 2” possible inputs and 2 possible outputs), 


we can always construct a quantum U¢ with (n+ 1) inputs |x) ,--- [Xn).[Xn41), such that 


(1a) o> 5 [Xn) + YD) > (Pad s+ Xn) LY ® Fa, +> Xn))). 


In other words, the first n inputs are the inputs to our function — the gate won't change them — and the last input 
will “carry the answer” — it will change based on the value of our function. (It is rather striking that we can fix all of 


the non-injectivity issues with just a single additional bit as input!) 


Example 349 


So now we're ready to see a simple quantum algorithm, known as Deutsch’s algorithm, in action. 


We'll suppose that we have access to an oracle, which can tell us the value of a function given any input. If we 


want to know about a function f with a one-bit input and output, we need to make two calls to the oracle (asking for 
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the values of f(0) and f(1)) in an ordinary computer. But now suppose that we care about the value of f(0) @ f(1) 
(mod 2): in the classical setting, we will need to make the two calls for f(0) and f(1) separately, but there is actually 
a way to get around that in the quantum case — we will only need one call of the oracle! 


We'll need to use the quantum gate that we constructed above: 


Ix) 


ly) © f(x) 


Applying Us is our oracle here (because that’s what tells us the value of the function f), and we claim here that 
we will only need to apply Urs once to get (0) @ f(1). And this makes some sense when we think about U; as a linear 


operator: consider the action of Us on the state 


|p) = H|0) @ |0) = ) + |1)) @ |0). 


1 
ve 
Since our first input |x) is a superposition of |0) and |1), it almost seems like we're evaluating x simultaneously at 


both of those values here. By linearity, we have that 
1 
Us |p) = ae (]0) |F(0)) + |1) |F(1))), 


and we now have entanglement of the two qubits (between the argument and the function), meaning that we've 
extracted the information of f(0) and f(1) simultaneously. But notice that we cannot actually extract both of those 
pieces of information at once, because measuring the value of the first bit will collapse us into either |0) |f(0)) or 
|1) |f(1)), and the other information is destroyed. 

Nevertheless, we can get composite information like f(0) @ f(1), and here's the quantum computer that does the 


job: 


Proposition 350 


The output of this computer will give us a state where we can read off the value of (0) @ f(1). 


Proof. Call the initial state |w%o), the state after the first two Hadamard operators |1), the state after the Urs “oracle 
query” |w2), and the state after the final Hadamard operator |Wout). We know that 


Ibo) = 10) © [1), 
which means that 
ls) = H[0) @ H[2) = 5(I0) + [2)) @ (10) — [a)) = 5 J0) @ (Jo) ~ [2)) + 5 [2) @ (Jo) — [2)). 


(Expanding out in this way will become useful soon.) We now want to act on this function with Us: we have 


Ia) = Ur da) = 5 10) © (IF(0)) — [2 @ F(0))) + 5 11) © (IF(2)) — 2 @ F(4))), 
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because Us leaves the first bit invariant. Finally, acting with H on the first output yields 


our) = 5H [0) © (IF(0)) — [1 @ F(0))) + 5H 1) @ (IF(1)) — [1 @ Fa). 


and expanding this out yields 


1 
2/2 


1 
a fee) Sey [1 @ F(0)) + |F(1)) — [1 ® F(1))) 4 |1) @ (JF(O)) — [1 ® F(0)) — |F(1)) + [1 © F(1))) |. 


We're trying to find the value of f(0) @ f(1), but it doesn’t seem to pop out obviously from our calculations — we'll 
need to be a bit more careful. Notice that whenever f(0) = f(1), we have f(0) © f(1) = 0, and otherwise we have 
f(0) @ f(1) = 1. And now let's consider each of these two cases: if f(0) and f(1) are equal, then everything in the 
second group cancels out in the above boxed expression, and whenever f(0) and f(1) are different, everything in 
the first group cancels out. So in both cases, the amplitude of one of our two terms will disappear — more specifically, 


the expression for our final state simplifies to 


oo |0) ® J5(1F(0)) — |1 & F(0))) when f(0) = f(1) = > f(0) @f(1) =0, 


|1) @ va (IF(0)) —|1@f(0))) when f(0) A F(1) = > F(0) @ f(1) =1. 
(Notice that f(0) and 1 @ f(0) always take on different values, so neither case has a wavefunction that is just zero.) 


So now we just need to measure along the computational basis states of the first qubit: whatever answer we end 
up with must be the value of f(0) @ f(1), and we're done. (And we end up with a +(|0) — |1)) no matter what the 


value of f(0) is, so we cannot extract any more information beyond what we have described.) 


We have now seen a bit of the power of quantum computation, but now we'll do another example that is a bit 


more interesting: 


Example 351 


Grover’s algorithm helps us solve a search problem of the sort where we are (figuratively) trying to find a black 


marble in a bag otherwise containing white marbles. 


Normally, we have to examine the marbles one by one, so in a large bag of N marbles, it will take about x tries to 
find the marble on average. But it turns out that we're only going to need about VN calls in the quantum case! (There 
are even more drastic improvements that we can make with quantum computation, such as with Shor’s algorithm, but 
this example here is illustrative enough.) 

So let's describe this problem more formally: suppose we have a set of size N = 2” (where n > 1 is some usually 
large positive integer). We can correspond the elements of the set X = {0,1,--- , N—1}, and we can then correspond 
those with the binary strings of length n, {x1 --- xp}. 

We now have a function 

F(x) = FOa, X2,°++ Xn) € {0, 1} 


which can be thought of as the oracle in this problem: for each integer x from 0 to N —1, we have either f(x) = 0 or 
1. Then the problem we're facing is to identify some element with f(x) = 1: suppose we're told that there M such 
values of x (so there are (N — M) non-solutions where f(x) = 0), and we're going to assume M < N here (because 
that's when the problem is “hardest”). Again, we need about g queries of the oracle for M = 1, but it turns out a 


quantum computer will only take about T/N steps. 
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Remark 352. When we say that we have an oracle, the idea is that the formula or method of finding f(x) is not 


transparent to us (so we can't just solve the equation f(x) = 1 ourselves): all we’re told is the final value. 


The idea is that when we feed in a label |x) (which is a set of n qubits), we're getting a reading of the oracle as 


follows: 


As usual, the first n qubits are inputs and stay unchanged, and the last input is changed by the value of f. So the 
operator Of takes the state |x) |q) into the state |x) |q @ f(x)), where we're (again) using the condensed notation 
that x = (%1,--+ ,X,). And this time, the idea is that we're going to pick a particularly nice starting state |q) so that 


our computation turns out nicer: we'll use 


1 
|g) =H{1) = vail) = |): 
Then the action of Of¢ on our input will look like 
Or(|x) @ H|1)) = |x) ® ssilFo) — |1@ F(x))) 


by linearity, and now we want to exploit that this last expression only depends up to a sign f(x) (we'll either have 
|f(0)) —|f(1)) or |F(1)) —|f(0))). Specifically, the second term of the tensor product is H|1) if f(x) =0 and —H|1) 


if f(x) = 1: another way to write this is that our final state is very similar to our initial state: 


Or (|x) @ H|1)) | = |x) ® (-1)° H]1) =| (-1)' (x) @ H[1)) } 


as 


So now we now have information about our function in a “sign” or “phase,” rather than the ket itself, and that’s nice when 
we're trying to do things with interference. And since this last qubit |q) = H|1) is unaffected by our transformation 
Or, we will omit it from the notation from now on (we'll just write things like Of |x) = (—1)* |x)). 

The next idea is to choose an input |x) which works well with our operator, and the idea here is that the Hadamard 


gate is very useful when combining information together. We will start with an initial state 
Wo) = (H|0))°” = H|0) @ H|0) ®--- @ HO) 


(n terms in total), and if we write out the definition of H|0) = 5 (0) + |1)), this initial state turns out to be 


as ) + [1)) @---@ (JO) + |1)). 


= ——((0 =—~((0 

J2n VN 
So each of the n terms in this final tensor product corresponds to one of the qubits, and thus if we expand all of the 
products, we will find an equal contribution from each of the N computational basis states (each of the n-digit binary 


numbers from 0 to N — 1): 


1 ee 
rag er nel ee Ia 


Because this initial state “represents all of the states at the same time,” having the oracle act on this will give us 


“information about all of the states” (though we can’t disentangle that information easily, which is why the problem is 
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difficult). To prepare for the action of O-, we can rewrite this state in a slightly different way, splitting based on the 


--Dw+e Ee 


xf (x)=0 xf (x)=1 


value of f: 


We now want to “normalize each of these components separately:” since there are (N — M) states in the first sum 


and M states in the second sum, we can rewrite this as 
N-—-M 1 ITA 
to) 4) eae ee Ix), 
M = Moe Ti Po ny 


so that we have a superposition of two normalized states: let's call them |a@) and |G) respectively, so that 


do) = [SM fay +f Igy. 


By construction, we know that (ala) = (6|6) = 1, and also (a|G) = 0 because the (computational basis) kets that 


appear in the definition of |a@) are disjoint from those that appear in |G). And now we're going to work in the vector 
space 
R? = span(|q) , |8)) 


(note that we have a real vector space because our coefficients are all going to be real in this case), which is nice 
because we can visualize |a@) and |G) as the orthonormal basis vectors along the x- and y-axis of an ordinary regular 
xy-plane (where |a) represents the “non-solutions” and |G) represents the “solutions” that we're trying to find). |qWo) 
is then a normalized vector pointing “mostly” in the |a) direction, because M < N: this means we can write it as a 
unit vector 

|Yo) = cos A |x) + sin Bo |B) 


for some small angle 0) = sin”? 4. Intuitively, our quantum circuit is going to slowly move this unit vector towards 

the |G)-axis (at which point we can just measure the state to get a solution) and that’s what we'll describe now. 
Notice that O¢ takes |a) (a superposition of non-solutions) to itself (because (—1)**) = (—1)° = 1 for every term 

in |)), but it takes |B) to — |) (because (—1)f) = (—1)! = —1 for every term). Therefore, applying Or preserves 


the |a@)-component and flips the |G)-component, meaning that 
Or |Wo) = Or (Cos % |) + SiN A |B)) = cos A |x) — sin Ap |B) - 


But now |) and Of |wWo) are reflections across the |a@) or x-axis, so they are separated by an angle of 269. This 
means that we'd be in good shape if we figured out how to reflect about our initial state |q): the result would have 


a larger angle to the horizontal, moving us towards the |G)-axis. And it turns out the operator that we want is 


Ro = 2|Wo) (Wol — / |. 


To check that this is indeed what we want, we can rewrite the identity term as |Wo) (Wo|+ | we) (ve| (for some vector 


| wa) perpendicular in the plane to our original state), and then we have 


Ro = 2|Wo) (Wol — (lo) (Wol + | bo) (ve |) = lo) (Wol — |wo) (del, 


which is an operator that preserves the |W) component and flips the |e) component, so it is indeed a reflection of 


the desired type (and is also unitary, so it’s a valid operator to use here). Because we started with the state 


|%o) = (H|0))°" = H®"|0), 
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(in this last equality we changed from using qubits 0 to the actual n-digit binary string 0), we can also rewrite our 


reflection operator as 
Ro = H®"(2|0) (0| — NH®". 


(We can check that this is valid, because the H®” on the left acts on the ket, the H®” on the right acts on the bra, 
and the two do nothing to the identity because H* = /.) So our quantum computer realizes the reflection Ro by using 


a series of three gates: 


The first and third of the gates are just qubit-wise Hadamard gates, and we need to figure out the second gate. 


That middle gate} R | is the same expression at Ro, but we're now reflecting around the |0)-axis, so it sends | |0) — |0) 


and | |x) + —|x) | for all x #0. (Remember that 0 still represents the n-digit binary integer here, so all 2” — 1 other 


computational basis vectors are flipped — just not |0).) This kind of gate can indeed be implemented using NAND 
gates, and thus we've indeed managed to construct an Ro gate. 


So thinking geometrically again, we can now return to the state 


RoOr |Wo) . 


(Remember that O- does take in an extra qubit |q) as input, while Ro does not.) The oracle O-¢ reflects Wo over the 
|ax)-axis, moving us to an angle of —@ in the |) |G)-plane, and then Ro reflects the result over the |wWo) state. Since 


the difference in angle is 28, our final result will have an angle of 6; = 34 in the |q) |G) plane. 


And we can just iterate this again and again: letting |G = RoO; | be the Grover operator, we can just act with 


G on |W) repeatedly. If we have some arbitrary state |W) at an angle y from the horizontal |@)-axis, then Of |) 
will be at an angle of —y, so applying Ro (rotating about the |wWo) state) to this state will give us G |w) at an angle 


of |‘y + 289 |, because Of |) is an angle (y + 60) away from |wo). 


IG) IG) 


In summary, Ro@¢ just rotates our vector by an angle 28 in the |a) |G)-plane for any arbitrary unit vector, so 


applying the Grover operator k times to |W) gives us a vector at an angle of 


M 
0, = (2k + 1)@ = (2k +1)sin* 4/ i 


to the horizontal. And now we know how to carry out our quantum algorithm: if @ is very small, we can apply G 
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enough times to get |.) to point very close to vertical, meaning that we're in a superposition of mostly states that 
have f(x) = 1. And then measuring along the computational basis states will give us one of the solutions, as desired. 


When M < N, we can approximate the number of steps via 


Tv M 
~=60, 8 (2k4+1)\/— 
5 k (2k + Vay 


and solving for k yields | k  —4/— |, which is the result that we promised at the beginning! And this quantum 


algorithm has given us an answer in O(WN) queries of the oracle, rather than O(N) as we would have in the classical 


CaSe. 


Fact 353 
Suppose that N = 29° = 109 and M = 1. Then it takes about 500 million queries with a classical computer to 


find our solution, while it only takes about 26000 calls with the quantum computer. 


But also remember that the quantum algorithm gives us a probability of success very close to 1, but not exactly 


equal to 1: specifically, the probability of success after we apply the Grover operator k times is 


Px =| (Bld) *, 


and this is basically asking us for the squared |G)-component of |w,), which is sin?(@,). 


Example 354 


Let's examine the probability of success for our quantum algorithm when N = 2° = 32 and M = 1. 


Then our starting state has an angle of 
M 1 
ee ae ee ee are ° 
8 = sin ~ 4/ N sin” 4/ 30 10.18", 
and since 999 is about 90 degrees, we'll want to take 2k +1=9 = k =4, and four calls of the oracle result in 
64 = 91.64° = > |w4) » —0.03 |a) + 0.9996 |B) . 


(Remember that in this case, |G) is just a single computational basis state, and |@) is an equal superposition of the 


other 31 computational basis states.) This yields a probability of 
P, = sin?(91.64°) = 0.99967 = 0.9991. 


So more than 99.9 percent of the time, four calls to the oracle will get us the value of x such that f(x) = 1: much 


better than the classical case, where we need sixteen calls on average to get the answer! 
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